$29
Objectives
Multithreaded programming with Pthreads (POSIX threads). Practicing synchronization, use of lock variables.
Designing and performing experiments; applying probability and statistics knowledge.
You can do this project in groups of two students each. You will use C and Linux.
Thread-safe Hash Table Library
In this project you will rst develop a library (libhash.a) that will implement thread-safe hash table data structure and its operations. A multi-threaded applica-tion using the library will be able to create one or more hash tables and access them from multiple threads running concurrently. The library code will sit in a le called hash.c and the respective header le (i.e., library interface) will be hash.h.
A hash table has N buckets (buckets 0 through N-1), i.e., N entries. A key i and an associate value (data), i.e., a key-value pair, will be inserted into bucket j
hash(i), where j is in range [0, N-1]. In this project, the hash function will be a simple hash function, i.e., hash (i) = i mod N. The key type is integer and valid keys are positive. A key value 0 is not valid. The value type will be void *. With proper type casting, you can store an integer value in a eld that is of type void *. Multiple key-value pairs mapping to the same bucket will be added to a linked list (chaining). In this way collisions will be resolved. Hence, for each bucket of the hash table we will have a linked list, initially empty.
A hash table will be protected by multiple locks to reduce lock contention while accessing the hash table from multiple threads. There will be one lock per M consecutive buckets in the hash table. We call such M consecutive buckets as a region. There will be N/M = K regions, hence K locks. The rst M consecutive buckets is the region 0 and is protected by lock 0; the next M buckets is region 1 and protected by lock 1, and so on. While doing an operation on the hash table (like insert, delete, get) and accessing a bucket, the corresponding lock must be acquired.
N can be a value between 100 and 1000. M can be a value between 10 and 1000. M should be N and N is a multiple of M. K can be a value between 1 and 100.
1
Your library will implement the following functions. It will de ne a HashTable type as well, so that applications can use this data type.
HashTable *hash init (int N, int K). Creates a hash table of N buckets protected by K locks. Each bucket will have an associated linked list (chain) initialized to empty list. Returns a pointer to the hash table created if success; otherwise returns NULL.
int hash insert (HashTable *hp, int k, void * v). Inserts key k and the associated value v into the hash table hp. Value is a pointer that can point to a structure containing data. The allocation and deallocation of memory for the structure is left to the application using the library. If the data is a simple integer, then it can be directly stored as the value (i.e., no need to separately allocate memory for the integer and store a pointer to that as the value). We are assuming void * and integer type has the same size. This function returns 0 if success; otherwise returns -1. If key already presents, does nothing and returns -1.
int hash delete (HashTable *hp, int k). Removes key k and the associated value v from the hash table hp. If success returns 0, otherwise returns -1.
int hash update (HashTable *hp, int k, void *v). Updates the value of key k to be v. If success returns 0, otherwise returns -1.
int hash get (HashTable *hp, int k, void **vp). The value associated with key k is retrieved into a pointer variable (of type void *) whose address is vp. If success returns 0, otherwise -1.
int hash destroy (HashTable *hp). Destroys the hash table and frees all resources used by it.
A multi-threaded application, for example x.c, that will use your library will rst include the header le hash.h corresponding to your library. The application may create many threads and each thread may do a lot of table operations. An application will be compiled and linked with your library as follows:
gcc -Wall -o x -L. -lhash x.c
Develop a Multi-threaded Application
Implement a multi-threaded integer-count program. The program will take C text les as input. Each input le may contain a large number of positive integers. Repetitions are allowed. The program will count the number appearances of unique integers. In other words, for each integer appearing in one of those C les, it will nd out how many times that integer occurs in those les. Your program will use a separate thread for each le. The threads will use a shared hash table to do counting. That means a key-value pair in the table will be an integer and its count. At the end, the program will write the integers in ascending sorted order to an output le. The output le will have a separate line for each unique integer. A line will contain an integer and its count separated by a colon as in the following output example:
3
1
2
4
2
1
A sample invocation of the program is shown below:
integer-count 7 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt out.txt There are 7 input les. The program will use 7 new threads. Each input le
may contain an unsorted sequence of positive integers (keys). Each line of input will contain a single integer.
Experiments and Report
Assume we are wondering about the e ect of the number of locks used (K) on the performance of the hash table operations. If there is one lock used, there will be high contention for it from multiple threads trying to access the hash table. Write an application (test.c) and design some experiments to analyze the e ect of number of locks (K) on the performance. Performance metric can be, for example, the time to execute a set of operations on the hash table for a set of threads. Try to nd out if you can see a performance increase while K is increased. Measure the time. Make a lot of experiments with di erent parameter values: number of threads (T ), number of keys/operations (W ), number of locks (K), table size (N). Try to see the e ect of number of threads (T ) on the performance as well. When T is 1, there will be no contention. Contention will increase when there are more keys be operated with and when there are more threads used. Show your numeric results in tables or graphs and try to interpret. Put your results, your interpretations, discussions into your report.
Submission
Put your report.pdf le and your program les (hash.c, hash.h, integer-count.c, test.c) and a Make le and a README.txt le into a directory named with your ID (for a group, a single le will be uploaded using the ID of one of the students). Then tar and gzip the directory. For example a student with ID 21404312 will create a directory named 21404312 and will put the les there. Then he will tar the directory (package the directory) as follows:
tar cvf 21404312.tar 21404312
Then he will gzip the tar le as follows:
gzip 21404312.tar
In this way he will obtain a le called 21404312.tar.gz. Then he will upload this le in Moodle.
Late submission will not be accepted (no exception). A late submission will get 0 automatically (you will not be able to argue it). Make sure you make a submission one day before the deadline. You can then overwrite it.
3
Tips and Clari cations
You need to learn how to use Pthreads mutex variables. There are links to some resources in the References section of the course webpage. You can nd additional resources from Internet.
A project 2 skeleton of code is posted in github: https://github.com/korpeoglu/cs342spring2019 p2
You can clone it and start with it. At least, it will help you with the compi-lation of the library and programs.
You will include the Make le with your project submission. Make sure it works in your environment (name your les accordingly). We will just type make and your programs should compile.
We may put clari cations to the homepage of the course (near the project assignment).
You are suggested to test your hash table library rst by developing simple applications. Then you can start developing integer-count application.
You can use mutex locks in the application code as well for integer-count application.
You can not change the interface (hash.h).
Our test programs will destroy the hash table after all threads nished using it.
4