$24
Collaboration Policy. Homeworks will be done individually: each student must hand in their own answers. It is acceptable for students to collaborate in understanding the material but not in solving the problems or programming. Use of the Internet is allowed, but should not include searching for existing solutions.
Under absolutely no circumstances code can be exchanged between students. If some code was shown in class, it can be used, but it must be obtained from Canvas, the instructor or the TA.
Assignments from previous offerings of the course must not be re-used. Violations will be penalized appropriately.
Late Policy. No late submissions will be allowed without consent from the instructor. If urgent or unusual circumstances prohibit you from submitting a homework assignment in time, please e-mail me.
Problem 1. Out-of-core Merge Sort (100 points) Create a program that implements the merge sort algorithm without storing all the data in memory. This will be accomplished by using files to store intermediate data. Note that for the merging step of merge sort, only the current element of each of the two arrays needs to be available at a given time.
Your program should be called oocmerge and take two mandatory command line arguments:
oocmerge <N <output filename
N is the number of floats to be sorted, which will be randomly generated. The output filename is of the file that will store the sorted array of floats (in binary format).
This algorithm is useful in practice for sorting massive amounts of data that cannot fit in RAM. (It would be combined with an in-core algorithm like selection sort in practice, but this is out of scope here.)
Requirements.
The program should begin by generating N floating point numbers between -100 and 100 using rand(). Floats can be generated by dividing the output of rand(), which is an integer, by RAND MAX and then transforming the result appropriately. Do not generate integer values only.
Assume N is positive and less than 1000 and that sorting will be in ascending order.
The program should NOT allocate any arrays statically or dynamically. All data should be held in files in binary format.
The program should create a directory named temp. The permissions of this directory should be changed so that it is completely inaccessible to anyone but the user.
1
Temporary files storing intermediate arrays to be merged should be written in the above directory. To avoid mistakes, the permissions of these files should be changed so that they can only be written when they are used as output and they can only be read from when they are used as input. Anyone but the user should not have access to them.
Temporary files should be deleted when they are no longer needed.
The program should end by verifying that the numbers in the output file are indeed sorted. This should be done also without storing the more than the minimum amount of data neces-sary for the verification in RAM.
Hints.
Worry about permissions after your code works, but do not forget to implement this part even if your code does not completely work.
I would start by implementing the flow of the algorithm without the actual sorting. Print useful messages for debugging, such as: ”Round R: merging arrays A and B.”
See how to use sprintf() to generate the filenames of the temporary files. For example sprintf(fname, "array%d.dat", 5);
There is a recursive and an iterative implementation. If you pick the former, make sure that it terminates. See second hint and start with 8 numbers.
Think about what the split step actually does.
Deliverables. A zip/tar/gz file containing:
Sources files, including at least one header file.
The makefile.
Everything must work on the virtual machine we will provide.
No report is required, but please include one in pdf format if there is anything that we should be aware of.
2