$24
• Introduction
In this assignment you are going to practice on hash table implementation for an employee information system of a global company, along with its accompanying hash function using C++. Remember that insertion, deletion, and retrieval operations are to run in expected constant time for hash tables.
Keywords: Hash table, Separate chaining, C++
• Problem De nition
In this homework you will implement a hash table for an employee information system of a global company. The application must store the ssn(social security number) of each employee in the hash table. Note that, each employee has a unique ssn. Your application need to provide the following requirements:
Inserting an employee into the hash table Deleting an employee from the hash table Finding employee with the given ssn
Finding the number of employees in the given bucket
• Speci cations
You will be implementing a hash table class, named as EmployeeTable, all by yourself. The bare header le, "Employ-eeTable.h", is given below. You are free to add any private variables/functions to it. However, your hash table should support at least the given functions.
c l a s s I t e m N o t F o u n d E x c e p t i o n : p u b l i c e xc ep t io n f
p u b l i c :
what ( ) c o n s t throw ( ) f
c o n s t c h a r
r e t u r n
"Item Not Found i n t h e Hash Table ! " ;
g
• ;
1
c l a s s E m p l o y e e T a b l e f
p r i v a t e :
//
. . . members , methods
p u b l i c :
//
C o n s t r u c t o r s & D e s t r u c t o r ,
be
c a r e f u l
about
memory
l e a k s .
E m p l o y e e T a b l e ( ) ; //
D e f a u l t
t a b l e
s i z e
i s 151
b u c k e t s .
E m p l o y e e T a b l e ( i n t n u m b e r O f B u c k e t s ) ;
//
numberOfBuckets i s a
prime number .
E m p l o y e e T a b l e ( c o n s t E m p l o y e e T a b l e& empTable ) ;
~ E m p l o y e e T a b l e ( ) ;
E m p l o y e e T a b l e& o p e r a t o r =( c o n s t
E m p l o y e e T a b l e&
rhs ) ;
//
Hash an
employee ’ s s s n t o
a
non
n e g a t i v e
i n t e g e r (
r e t u r n
hash va lu e , not
//
bucket
number ) . Use
hashValue (mod
numberOfBuckets )
a s bucket
number .
//
Return
1 on i n v a l i d
s s n .
i n t
h a s h E m p l o y e e ( string ssn ) ;
//
Put
an
employee t o t h e hash
t a b l e .
// Do n o t h i n g on i n v a l i d p a r a m e t e r s .
v o i d addEntry ( Employee& employee ) ;
//
Remove
t h e e n t r y from t h e
hash
t a b l e .
// Do n o t h i n g on i n v a l i d p a r a m e t e r s .
v o i d r e m o v e E n t r y ( Employee& employee ) ;
//
Get
t h e
t o t a l number
o f employees
i n
t h e g i v e n
bucket .
//
Return
1 on i n v a l i d
bucket
number .
i n t g e t N u m b e r O f E m p l o y e e I n B u c k e t ( i n t
bucket ) ;
//
Find employee with t h e g i v e n s s n .
//
Important : To add an
e x t r a
f l a v o r
t o
your homework ,
your
code
//
w i l l be
e v a l u a t e d by
how
much
time
i t
t a k e s
t o
a c c e s s an
employee
//
on
t h e
a v e r a g e . You
w i l l
g e t
a
s m a l l
p o r t i o n
o f
your g r a d e
//
depending on t h e a v e r a g e
a c c e s s
time
o f
your
code .
//
For
i n v a l i d c a s e s , throw
ItemNotFoundException .
Employee
f i n d E m p l o y e e ( string
ssn ) ;
◦ ;
An employee’s attributes will be represented with the Employee class. "Employee.h" is given below. It should not be modi ed in any case.
c l a s s Employee f
p r i v a t e :
string name ; // name o f t h e employee . I n v a l i d : ""
i n t e x p e r i e n c e ; // e x p e r i e n c e o f t h e employee i n terms o f months . I n v a l i d : 1
string city ; // c u r r e n t working p l a c e o f t h e employee . I n v a l i d : ""
string ssn ; // s s n o f t h e employee . I n v a l i d : ""
p u b l i c :
• C o n s t r u c t o r s Employee ( ) ;
Employee ( string name ,
i n t experience , string
city , string ssn ) ;
//
G e t t e r s and s e t t e r s
string getName ( ) ;
v o i d setName ( string
name ) ;
i n t g e t E x p e r i e n c e ( ) ;
v o i d s e t E x p e r i e n c e ( i n t e x p e r i e n c e ) ;
string getCity ( ) ;
v o i d setCity ( string city ) ;
string getSsn ( ) ;
v o i d
setSsn ( string ssn ) ;
//
c h e c k s
whether g i v e n
two Employees a r e same
o r not .
g;
b o o l
compare ( Employee employee ) ;
2
You should try to nd a hash function that distributes the employees as uniformly as possible to the hash table (hash table’s buckets). The ultimate goal - which may not possible - is to create a hashing that aims at most 1 employee per bucket. You will be graded on how close you are to this uniformity. You are strongly encouraged to use \ssn" when designing your hash function.
You will implement \EmployeeTable.cpp", and will use separate chaining as collision resolution strategy. Employees will be \chained" to co-exist in the same bucket.
Note that, employee with same name, experience or city may exist in test cases. However, you can assume that ssn of each employee will be unique.
Sample datasets will be given to you, each including name, experience and city for every employee, as well as their ssn. Test datasets will resemble the given datasets, however, they will also include new employees. So, you simply can not design a perfect hash function that assumes all keys and values are known a priori. During testing, after inserting all values, we’ll request a number of employees from the hash table.
If you have a high number of collisions, the time it takes for your code to fetch an employee increases. Try to reduce the number of collisions as much as possible for the given datasets. Your responsibility in this context includes providing the correct number of employees in a bucket, and of course, returning them correctly.
You will analyze the success of your implementation by evaluating it for the given datasets, and submit a report including the analysis. Please read "Report" section for details.
Dynamic resizing/rehashing is not allowed. Your work will be inspected to ensure fairness.
80% of your grade will be based on the correctness of your solution, while the remaining 20% will be related to the report and performance of your code.
• Dataset
You will be provided a number of datasets (Company1.txt - ... - CompanyM.txt), each le representing a new employee table used for several companies. The le format is given below:
167 // S i z e
o f
employee
t a b l e .
Employees
Alice // Name o f
f i r s t
e n t r y ( employee ) .
30 // E x p e r i e n c e o f f i r s t e n t r y .
New York //
City
o f
f i r s t
e n t r y .
0905552321
//
SSN
o f
f i r s t
e n t r y .
. . .
. . .
Belle // Name
o f
Nth
e n t r y .
35 // E x p e r i e n c e o f Nth e n t r y . Oslo // City o f Nth e n t r y . 0119120121 // SSN o f Nth e n t r y .
SSNs
00048203525 // SSN 1 .
00798448113 // SSN 2 .
. . .
. . .
06498095975 // SSN K.
You will test your code yourselves by reading the input and performing the required operations in your main function. For each entry in \Employees" list, you will call \AddEntry(...)" function after forming the \Employee" object with required parameters. Similarly, each entry in \SSNs" list requires a call of \FindEmployee(...)" function. Please test all of your functions to verify that they work correctly. The time performance test will only take \FindEmployee()" calls into account.
3
• Report
You should create a new hash table with the speci ed size in each dataset, and hash all entries in \Employees" list. Then, you will make the speci ed calls. Your report should include your table’s performance on each dataset.
The report you will submit should have the following format:
1. Time Performance: Report average access time for all sample datasets (Only take \FindEmployee()" calls into account). To measure time, form a loop and do all access calls successively. Make comments on why certain datasets’s performance are di erent than others. If you have made any optimizations in your code, you can mention them here. To measure time, you can use \clock()" function in \time.h".
2. Hash Function Quality: Count average number of collisions for all given datasets in \FindEmployee()" calls. Explain the nature of the data, and how you improved your hash function.
3. Bucket Statistics: List average number of entries in non-empty buckets (along with total number of non-empty buckets) for each dataset. Compare them with the ideal values. Comment on the results.
• Regulations
1. You will use C++.
2. If your submission fails to follow the speci cations or does not compile, there will be a "signi cant" penalty of grade reduction.
3. Late Submission: Late submission policy is stated in the course syllabus.
4. Cheating: We have zero tolerance policy for cheating. In case of cheating, all parts involved (source(s) and receiver(s)) get zero. People involved in cheating will be punished according to the university regulations. Remember that students of this course are bounded to code of honor and its violation is subject to severe punishment.
5. Newsgroup: You must follow the Forum (in Odtuclass) for discussions and possible updates on a daily basis.
• Submission
1. Submission will be done via Moodle (cengclass.ceng.metu.edu.tr).
2. Do not write a main function in any of your source les.
3. A test environment will be ready in Moodle.