$24
Objective
Build an open-addressed hashtable (with operations find, add, and delete) and test it with different probing schemes: (i) linear probing (ii) quadratic probing, and (iii) double hashing. In particular, for each probing scheme:
count the number of probes for each find (and average it over a number of find operations) when the mix of operations includes only find and add operations;
Repeat step (a) for mixes with different proportions of find and add operations (i.e. number of find operations varying from, say, 10% to, say, 90%).
Repeat step (b) for different load factors (from, say, 0.25 to, say, 0.99).
For high load factors (from, say, 0.6 to, say, 0.99) measure the number of clusters, the average size of clusters, and the maximum size of clusters; [Definition: A cluster is sequence of non-empty cells or buckets. End of Definition.]
Repeat steps 1.(b) and 1.(d) for the same hashtable after including delete operations in the mix:
Consider suitable proportions of the three operations while repeating (b);
Assume that delete merely marks the cell deleted and does not recover space but add may reuse the deleted space to insert an item.
Implement a sweep operation on a hashtable which scans the table and “pulls up” items in a chain following a cell marked delete. A sweep operation is typically used to minimize the chains formed due to probing. Consider a chain of three elements x, y and z, all of them gave the same output from a given hash function. So they must have formed a chain in the hash table resultant of probing (say linear probing) as shown below. If y gets lazy deleted and marked as delete, the probing chain is not contiguous anymore. Sweep operation essentially pulls up z into the location of y making the chain of probing contiguous.
x y z
x z
Implementation
Implement a hash table with open addressing with a “word (string)” as key. The choice of probing scheme (linear probing, quadratic probing, and double hashing) will be given as user input. Size (number of buckets) will be derived as 2t for a user specified value of t. Each entry of the hash table should be a character pointer, which will either point to NULL or to base address of the string, stored in an array of strings, which have been assigned to this index. Each entry of hash table may have additional fields other than this character pointer.
Hashing Function: int hash(str, t)
Required to hash a string str to an index in the range [0, 2t – 1].
/**** pseudocode for the hash function ******
1 | P a g e
First, the stringis converted to a 32-bit unsigned integer m using the following steps
Let be the length of .
−1=0
c. for = 0, 1 … . − 1, compute= (
−1
+ [ ])232
, where = 216
+
28 − 1 = 65791 and [ ] stands for ASCII value of the character.
= −1
3. Return most significant bits of ( ∗ 7) 232 /****end of hash function pseudocode********
Note: In case of double hashing, for second hash function, modify step 3 of above hash function to return least significant bits of ( ∗ ) 232. End of Note.
Structure of data type to store hash table
The data type for hash table should contain the following fields.
t : user given parameter
size : size of the table (size = 2t)
entries: number of elements in the hash table
loadFactor: a double variable storing the ratio = entries/size. Should be updated after each add and delete call.
findCount: number of times find function have been called (do not include the calls made from inside add and delete functions)
findProbes: number of times probing have been done (do not include the probing done during call to add and delete functions)
findRatio: a double variable storing the ratio = findProbes/findCount. Initial value = 0.0
Cluster Calculation
A cluster is sequence of non-empty cells or buckets. Given a hash table, some cluster information can be calculated as follows:
clusterCount: number of clusters in the hash table.
clusterAvgSize: average size of clusters. It can be calculated by dividing the sum of length of each cluster by clusterCount.
clusterMaxSize: maximum size of clusters
Input format
Each line will start with a one of the following key values (0, 1, 2, 3, 4, 5, 6, 7, or -1). Each key corresponds to a function and has a pattern. Implement following function according to given pattern and write a driver function which can call the functions according to given key value.
Ke
Function to
Input
Description
y
call
Format
1
readData
1 N
Read next N lines having a single word of at most 20 characters and store
str1
it in an array of strings, called as data. Allocate memory dynamically for
str2
this array.
…..
strN
2
createHashta
2 K t
2 shows creation of a hash table with open addressing.
ble
K represents probing scheme
0 linear probing
1 quadratic probing
2 | P a g e
2 double hashing
“t” will be a parameter for hashing function and size of hash table will be
2t .
It should only create an empty hash table. Each entry of the hash table
should be a character pointer, which will either point to NULL or to base
address of the string in data array, which have been assigned to this
index. Each entry of hash table may have additional fields other than this
character pointer.
3
find
3 i
Search for ith element of data array in the current hash table. Counting
starts from 0. It should return a pair <number of probes, character
pointer. Returning character pointer should either contain NULL or
address of cell containing the value.
4
add
4 i
Insert ith element of data array in the current hash table. There will be a
single hash table in a program. Counting starts from 0. It should call find
function to know if the value already exists.
5
delete
5 i
Delete ith element of data array from the current hash table by making
the character pointer point to NULL. Counting starts from 0. It should call
find function to know if the value already exists.
6
lazyDelete
6 i
To delete ith element of data array from the current hash table, do not
destroy the character pointer, instead, mark the cell as deleted. If a
subsequent probe from find reaches here, it should be treated as if the
value has NOT been deleted (but it should fail, if the value being searched
is the deleted value i.e. if probing stops here). But, if a probe from add
reaches here then the character pointer should point to base address of
new string in data array i.e. new value should get added here.
7
experiment
7 K
Next K lines, will contain calls to mix of add, find, delete, and lazyDelete
4 a1
functions (one function call in each line). After processing each line, print
4 a2
the value of loadFactor and findRatio in a new line (tab separated).
3 a3
After processing all K lines, calculate clusters and print clusterCount,
…
clusterAvgSize and clusterMaxSize in a new line (tab separated).
3 aK
8
clear
8
Clear the hash table (like it was at the end of call to createHashtable) and
restore member variables to their default values.
9
sweep
9
Apply sweep operation on the hash table.
-1
stop the program.
Test Case 1:
Input
Output
10 kuuxfit omqjn xqsdpag uekd zfnv hqnmzh nukxhjv byncerx
3 | P a g e
balbg
xwgojtn
2 0 4
12
1
5
9
0
2
5
3
7
1
8
6
6 -1
4 | P a g e