$24
You are to create a hash table to store the words contained in an input file; the resulting table must be at least 70% full. Create and implement your own hash function that minimizes the number of collisions; experiment until you find a function that gives you at least 50% hashing efficiency. You must use open addressing and linear probing to resolve collisions. Your program must also calculate and print out statistics about its efficiency for the given input.
Create a Java program that does the following:
1. Read in a set of words from the input file one at a time, in a loop.
2. Use your hash function to calculate the table address for the word, and then insert it into the table. Use open addressing and linear probing to resolve any collisions.
3. Once all the words have been inserted into the hash table, reread the input file, using each word as a search item, and search for the word in the table; you will use the same hash function and linear probing technique as above. As you do the search, keep track of the total number of reads you make as you search through the table, as well as the longest chain of reads you make as you search for a word. Calculate and print out to the output file the average number of reads per record, the load factor, the hashing efficiency, and the size of the longest chain when searching.
The program will be invoked from the command line as follows:
java Assign5 input output
where Assign5 is the name of the file containing executable bytecode for your program, input is the name of the input text file containing the list of words, and output is the name of the text file containing the statistics. If the command line arguments are improperly specified, the program should print out a "usage" message and abort. Be sure the specified files can be opened or created.
Create your own input files to test your code. An official input file will be made available to the class before the due date; use this file to create your output file. You must program all the data structures from scratch; in other words, do not use any classes from the Java libraries to implement the hash table.
Bonus (optional)
Create a second version of your program where you use quadratic probing instead of linear probing. Use the command-line flag -q to indicate this when invoking your program. Print out the second set of statistics and compare this technique with linear probing. Is the performance better or worse?
Questions
1. Briefly describe your hash function and how it works.
2. Analyze the quality of your hash function based on the statistics reported. How could you improve it?
3. Compare your hash function with an ideal hash function. How close is it to ideal performance?
4. If the input data set contained numerals (for example, student ID numbers) instead of alphabetical characters, how would it influence the performance of your hash function? Justify your answer with a brief analysis.
Submit electronically using the D2L dropbox:
1. A readme.txt text file, which indicates how to compile and run your program, and if you are doing the bonus part.
2. Your source code files. Your TA will run your program to verify that it works correctly. Make sure your Java program compiles and runs on the Computer Science Linux servers.
3. The output file(s) that your program generates. Name these output1.txt, and output2.txt (if doing the bonus part of the assignment).
4. The answers to the above questions, in Word or PDF format.
Collaboration
The assignment must be done individually so everything that you hand in must be your original work, except for the code copied from the text or course directory, or that supplied by your TA. When someone else's code is used like this, you must acknowledge the source explicitly. Copying another student's work will be deemed academic misconduct. Contact your TA if you have problems getting your code to work.