Starting from:
$35

$29

HW 3: Bash Scripting Solution

Overview: The last homework had you create a pipeline of utilities to locate  les with the same content.

This homework will have you perform this task again using features of bash.

Objective: Write a bash program that nds all the les with the same md5sum hash, sorts them by the size of the group sharing that hash, and prints out the groups as a comma-separated list.

Guidelines:

You must do this by splitting the program into three functions:

{ The rst function, find_and_count_duplicates , will be for listing the les ( nd), determining their md5sum (md5sum), and creating a few arrays. One array will map the hash to all the les that have that md5sum. Another array will store a count of the number of duplicates for each md5sum.

{ The second function, assign_md5sums_to_group_sizes , will have you create an array that will store all the md5sum hashes for a single group size.

{ The third function, print_group_sizes , will be to iterate over the group sizes, and print the les for each md5sum group of that size.

You must comment some of your code in each function (use # before text). You must use at least one for loop and at least one while loop.

Restrictions:

No credit will be given if you use the method from solution 2.

The only commands you are permitted to use outside of bash’s builtin functionality are find, md5sum , sort, and tr (once each).


Helpful hints:

Command substitution is one way of initializing an array

To use a parameter as the key of an associative array, you will need to quote that variable.

Printing after each step to diagnose problems early on. Colorizing the input can also help you debug.


Bonus 1: Use the ANSI escape codes to print out in color. This will make output much easier to read.

Hint: These look like



1

# Regular Colors

Black= \033[0;30m    # Black

Red= \033[0;31m    # Red

Green= \033[0;32m    # Green

Yellow= \033[0;33m    # Yellow

Blue= \033[0;34m    # Blue

Purple= \033[0;35m    # Purple

Cyan= \033[0;36m    # Cyan

White= \033[0;37m    # White


Bonus 2: Print out the directory of the  les in one color, and the actual  lenames in another color.

Bonus 3: Is this faster or slower than the rst homework 2 solution? Compare the time taken for both on di erent numbers of unique les using the time command. Test for 5000 unique les, 20000 unique les, and 50000 unique les. You may use the command:


for i in    seq <filecount> ; do echo $i >> $i; done

to create unique    les.

Submission:

Upload a single text    le (.txt) containing all of your answers to Canvas by the due date.




































2

More products