Starting from:
$30

$24

Choice Hashing Solution


Designing a perfect hash function is a non-trivial task. In addition, the performance of a hash function 
highly depends on the properties of data set. For “separate chaining” hash table, one goal of the hash 
function is to reduce the maximum length of the linked list among all the table bins. 
One simple variant of the above hash table is called “2-choice hashing”, where you use two 
independent hash functions (h1 and h2) instead of just one. For each object, you will use the two hash 
functions to give you two choices of positions in the single table. Please note this is different from 
“double hashing”.
Insert: With two bins to choose in the table, your strategy is to first check which bin contains the shorter 
linked list, then insert the new object to the front of that shorter linked list. Otherwise, if two linked lists 
are of the same length, you insert to the bin returned by your first hash function h1.
Search: like insert, you use the two hash functions to find out the two bin locations in the table, you will 
search both two bins in the table since you won’t know which bin contains your search object. 
Please note: the hash table should not contain duplicate objects.
It is possible that two hash functions may sometimes return the same bin position. Then just do insert 
and search only for that bin. That’s why we want the two hash functions to be independent, so that this 
won’t happen too regularly. 
For more details on “2-choice hashing”, please refer to https://en.wikipedia.org/wiki/2-choice_hashing
Data set: We will use the data file (Grocery_UPC_database.csv) provided. You will insert the items in 
the same order as in the file. Don’t modify this file in any way because your output will depend on that. 
Here are just some samples from it:
773743500068,Beveri Golden Flaxseed Fine Milled With Mixed Berries - 1 Lb
895172001487,Pure Life Shampoo Lavender - 14.9 Fl Oz
773743500051,Beveri Golden Flaxseed Fine Milled With Cranberry - 1 Lb
895172001432,Pure Life Body Lotion Coconut And Mango - 14.9 Fl Oz
2-choice hash functions: Each object contains the UPC number as upc and the description string as 
desc, we define two independent hash functions:
h1(upc) = upc % tableSize
h2(desc) = abs(desc[0]+27*desc[1]+729*desc[2]) % tableSize
We know the first 3 characters (could be number, letter, punctuation, or even space) of the description 
usually are from the brand of the item, so we expect lots of collisions for h2. We also know the UPC 
format’s last few digits are independent of brand. So h1 and h2 should be independent from each other. 
Table size: Your program will test different table sizes: 1000, 10,000, 100,000. With total around 
110,000 items from the data file, if we choose a table of size 100,000, then ideally, we hope to have 
each bin contains only one or two objects. If table size is 10,000, ideally the bin length should be around 
10 for each bin, and so on. Your code will report “standard deviation” (code provided) to describe the 
difference from the ideal cases.
Output: 
There are already some sample tests (using test_data.csv) included in the main function. 
You get 1 point after the sample tests pass.
After that, your program will be checked with the full database. It will print 3 standard deviations for 3 
different table sizes. Each correct output earns 4 points, total 12 points.
Your program will also print out 18 positions (6 test items for each table size), each correct position 
earns 1.5 points, total 27 points
1 + 12 + 27 = 40 point in total.

More products