$29
Introduction:
Suppose we are inserting n keys into a hash table of size m. Then the load factor is de ned to be n=m. For open addressing n m, which implies that 0 1. In this assignment we will study how the load factor a ects the average number of probes required by open addressing while using linear probing and double hashing.
Design:
Set up the hash table to be an array of HashObject. A HashObject contains a generic object, a duplicate count and a probe count. The HashObject needs to override both the equals and the toString methods and should also have a getKey method.
Also we will use linear probing as well as double hashing. So design the HashTable class by passing in an indicator via constructor so that the appropriate kind of probing will be performed.
Choose a value of the table size m to be a prime in the range [95500 : : : 96000]. A good value is to use a prime that is 2 away from another prime. That is, both m and m 2 are primes. Two primes (di er by two) are called \twin primes". Please nd the table size using the smallest twin primes in the given grange [95500 : : : 96000]. Vary the load factor as 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 by setting the value of n appropriately, that is, n = m. Keep track of the average number of probes required for each value of for linear probing and for double hashing.
For the double hashing, the primary hash function is h1(k) = k mod m and the sec-ondary hash function is h2(k) = 1 + (k mod (m 2)).
To test whether a number p is a prime, the following method should be used:
if ap 1 mod p 6= 1,8
<
if ap 1 mod p = 1,
:
then p is not a prime
then p is most likely a prime with a false positive chance of 10113
9
=
;
where a is a random integer with 1 < a < p. To increase the certainty of the test, please perform the test twice using di erent random numbers.
There are three sources of data for this experiment as described in the next section.
Note that the data can contain duplicates. If a duplicate is detected, then update the frequency for the object rather than inserting it again. Keep inserting elements until you have reached the desired load factor. Count the number of probes only for new insertions and not when you found a duplicate.
Experiment:
For the experiment we will consider three di erent sources of data as follows. You will need to insert HashObjects until the pre-speci ed is reached, where
{ Data Source 1: each HashObject contains an Integer object with a random int value generated by the method nextInt() in java.util.Random class. The key for each such HashObject is the Integer object inside.
{ Data Source 2: each HashObject contains a Long object with a long value gen-erated by the method System.currentTimeMillis(). The key for each such HashObject is the Long object inside.
{ Data Source 3: each HashObject contains a word from the le word-list in the directory
/home/JHyeh/cs321/labs/lab3/files
The le contains 3,037,798 words (one per line) out of which 101,233 are unique. The key for each such HashObject is the word inside.
Note that, for fair comparison, the data inserted into both linear and double tables must be the same.
When you hash a HashObject into a table index, you will need to
{ Compute the hashCode() of the key of the HashObject.
{ Use the hashCode() to perform the linear probing or double hashing calculation. Note that hashCode() can return negative integers. You need to ensure the mod operation in the probing calculation always returns positive integers. For example, the computation of the primary hash value (i.e., h1(key)) should be
primaryHashValue = key.hashCode() % tablesize; if (primaryHashValue < 0)
primaryHashValue += tablesize;
We need to do the same while calculating the secondary hash value (h2(key)).
Note that the modulus in h2(key) is tablesize - 2.
Note that two di erent objects (key objects) may have the same hashCode() value, though the probability is small. Thus, you must compare the actual key objects to check if the HashObject to be inserted is a duplicate.
Don’t copy the word-list le to your directory since its is large. Instead set up a symbolic link as follows:
ln -s /home/JHyeh/cs321/labs/lab3/files/word-list
Required le/class names and output:
The source code for the project. The driver program should be named as HashTest, it should have three (the third one is optional) command-line arguments as follows:
java HashTest <input type <load factor [<debug level]
The <input type should be 1, 2, or 3 depending on whether the data is generated using java.util.Random, System.currentTimeMillis() or from the le word-list. The program should print out the input source type, total number of keys inserted into the hash table and the average number of probes required for linear probing and double hashing. The optional argument speci es a debug level with the following meaning:
{ debug = 0 ! print summary of experiment on the console
{ debug = 1! print summary of experiment on the console and also print the hash tables with number of duplicates and number of probes into two les linear-dump and double-dump. The two sample dump les generated by executing [jhyeh@onyx sol]$ java HashTest 3 0.5 1
are given in the following directory /home/JHyeh/cs321/labs/lab3/files/linear-dump and /home/JHyeh/cs321/labs/lab3/files/double-dump
Please make sure your dump les have the same format as the provided sample dump les so that you can use linux command diff to compare them for correct-ness checking.
For debug level of 0, the output to the console is a summary. An example is shown below.
[jhyeh@onyx sol]$ java HashTest 3 0.5
A good table size is found: 95791
Data source type: word-list
Using Linear Hashing....
Input 1305930 elements, of which 1258034 duplicates
load factor = 0.5, Avg. no. of probes 1.5969183230332387
Using Double Hashing....
Input 1305930 elements, of which 1258034 duplicates
load factor = 0.5, Avg. no. of probes 1.3904918991147486
Submission
A readme le that contains tables showing the average number of probes versus load factors. There should be three tables for the three di erent sources of data. Each table should have eight rows (for di erent ) and two columns (for linear probing and double hashing). A sample result containing three tables can be seen in the le below /home/JHyeh/cs321/labs/lab3/files/sample result.txt
Please do not submit executable since I’ll be recompiling your programs.
Before submission, you need to make sure that your program can be compiled and run in onyx. Submit your program(s) from onyx by copying all of your les to an empty directory (with no subdirectories) and typing the following FROM WITHIN this directory:
submit jhyeh cs321 p3