Starting from:
$30

$24

Software Systems Final Project

Project Synopsis

For your nal project, you will implement a software solution that consists of shell scripts and C program to compute the lab attendance associated with each student.

Exercise 1 - Version Control (2 Points)

Create a local git repository (private) in your home directory. You will be using this for the rest of your nal project work to write any shell scripts, C programs, etc. DO NOT use public repositories such as github.

We expect to see at the least 4 commits, each of which are at the least 10 minutes (or more) apart. So commit your work frequently as you reach some logical milestone of your project work. No points will be allocated for this exercise unless this requirement is completely met. You must turn in your git log command output copied/redirected as git.txt le, once you have nished your complete project work.

Exercise 2 - A Shell Script to Pre-process the Information in the CSV Files. (6 Points)

At the end of a lab session, TAs will download a CSV le that contains the attendance information recorded by zoom into their respective folders. Let us start by taking a peak at the directory. The output below is truncated for brevity.


    • tree LabAttendance

LabAttendance

|-- lab1

    • |-- lab-A.csv

|    |-- lab-B.csv

........

|    |-- lab-I.csv

...

|-- lab3

    • |-- Lab-A.csv

...

|-- lab4

|    |-- LAB-A.csv

......

|    |-- LAB-I.csv

|

|-- lab5

......

|-- lab6

...

|-- lab7

|-- lab-a.csv

...

|-- lab-i.csv

As we can see, each of the seven lab groups (1 through 7) has an attendance le for each of the nine labs (A through I). While the CSV lename conventions are consistent, their case (uppercase/lowercase) can be in di erent combinations depending on how the TA named their les.

Next, let us take a peek into some of these CSV    les. (These are synthetic data, not actual student names).


    • head -3 lab1/lab-A.csv

Name (Original Name),User Email,Total Duration (Minutes),Guest Sharda Freedman,sharda.freedman@mail.mcgill.ca,64,No Sterling Boone,sterling.boone@mail.mcgill.ca,64,No


    • head -3 lab5/Lab-A.csv
Name (Original Name),User Email,Join Time,Leave Time,Duration (Minutes),Guest
Mack Boyd,mack.boyd@mail.mcgill.ca,01/21/2021 08:58:30 PM,01/21/2021 10:17:06 PM,79,No
Chung Tibbs,chung.tibbs@mail.mcgill.ca,01/21/2021 08:58:31 PM,01/21/2021 10:03:25 PM,65,No



The CSV les have can have one of two possible formats (each le follows one or the other and not a mix). The second format contains the join time and leave time as extra attributes compared to the rst format.

Your rst task is to collect the information from all these CSV les and build a consolidated CSV le that contains the records in a common format.

    1. Create a shell script fixformat.sh to do this task. You can use any shell/Unix commands already available in mimi to implement this script (other than sorting).

    2. The shell script expects two arguments, the rst being the directory under which it should recursively search for the CSV les that follows the naming convention mentioned above. You should nd such CSV les no matter under which subdirectory they are stored in (i.e., subdirectory names and depths should not in uence the outcome). In other words, even if we add a new lab group, etc., it should not impact your script. Your shell script’s objective must be to look for the CSV les that follow the naming convention anywhere under that directory hierarchy. The second argument is the CSV le into which the script will store the re-formatted (and consolidated) output that it processed from all the input CSV les. Either of these arguments could be absolute or relative paths.

    3. (1 Point) If the shell script is not invoked with su cient arguments, display a usage message and terminate with code 1.


    • ./fixformat.sh
Usage fixformat.sh <dirname> <opfile>

$ echo $?
1

    4. (1 Point) If the rst argument is not a name of an existing directory, the script should throw an error message and terminate with code 1.


    • ./fixformat.sh /data/LabAttendance /data/labdata.csv
Error /data/LabAttendance is not a valid directory

    5. Do not create an output  le in the error/usage situations described above. (-1 Points).

    6. (4 Points) When invoked correctly, the script will nd all the CSV les under the given directory hierarchy that follows the naming convention (mentioned previously), reformat the information and consolidate them into a single output CSV le, after which it will terminate with code 0.


    • ./fixformat.sh /data/LabAttendance /data/labdata.csv
$ echo $?
0

A sample of the output is given below.


    • head -3 /data/labdata.csv
User Email,Name (Original Name),Lab,Total Duration (Minutes)
usha.rush@mail.mcgill.ca,Usha Rush,D,64
bessie.thompson@mail.mcgill.ca,Bessie Thompson,D,58

There is a header at the top of the le, followed by the actual information records. The shell script has done the following transformations.

    (a) For each record, it included only the email, name and duration information from the original CSV  les.

    (b) To each record, it added a new column, Lab, which it derived from the name of each CSV le and added to the records that it read from that CSV le (the values of this column represents the labs A through I). The values for this column should be uppercase alphabets irrespective of the case of the CSV lenames.

The output produced by your shell script should follow the same exact formatting (order of columns, etc.). There is no particular order in which the actual attendance information records are to be stored in the CSV le. Deliberately sorting attendance records when producing the output is not allowed and can result in a 0 for this exercise. That will be the task of the C program in the next exercise.
    7. Existing output  les must be overwritten by your script (not appended).

    8. You need not handle any other error/failure situations other than those mentioned above.

    9. You do not have to handle conditions where a directory may not have any relevant CSV les or the CSV les not having headers/records (i.e, any le that matches the naming convention will be valid CSV les).

Exercise 3 - A C Program to Compute the Attendance. (10 Points)

Now that we have done some preliminary cleaning of the attendance data, it is time to compute the attendance associated with each student. At a high level, rst we need to gure out for each student how much time (duration) they attended a lab. Keep in mind that even for a given lab, (say lab D) a student may have multiple records. This is because the student might have got disconnected and had to connect back (which results in multiple zoom records for the same meeting). Another reason could be that the student could only attend part of the lab for their group (say lab 2) and had to catchup on another TA’s lab (say lab 5). In all of these cases, our objective is to nd the net amount of time (duration) that the student spent on a given lab (say lab D).

    1. Use vim to create C program les labapp.c, zoomrecs.c, zoomrecs.h and a makefile. You are not allowed to create any other C or H les for this exercise.

    2. You can use any functions and features available in the C libraries sdtio.h, string.h and stdlib.h, but your implementation should be solely on C language (i.e., you should not, for example, use the system function call to execute part of your work using another Unix command.)

    3. The executable of the program should be named labapp. It will receive two arguments, an input CSV le and an output lename for the program to write its output to. Either of these arguments can be an absolute path or relative path. Below is an example of its execution syntax


 $ ./labapp /data/labdata.csv /data/attendance.csv

    4. zoomrec.h must contain the following C struct de nition. No changes are allowed to this structure. struct ZoomRecord

{

char email[60]; // email of the student

char name[60]; // name of the student

int durations[9]; // duration for each lab.

struct ZoomRecord *next;

};

This will serve as the \Node" of your linked list. As can be seen, the structure is used to record a student’s email(unique to a student), name and the durations associated with each of the nine labs.

    5. zoomrecs.c will contain at the least two functions (you can add more as needed for your logic), addZoomRecord and generateAttendance. These two functions are intended to be called from the code that is in labapp.c.

    6. addZoomRecord is called once for EACH time a line (record) of lab attendance is read from the input le. The function will search for the student’s information in a linked list (using email as the search attribute). If found, it will update/increment the duration associated with that lab for that student. If the student is not found, it will create a new ZoomRecord for the student, and add it to the linked list such that the list is maintained in an order of email ids (and update the relevant lab’s duration).

    7. generateAttendance is called ONCE, after all of the input information has been read into the linked list. It will then read through the linked list (which is now in the order of student email ids) and write to the output le the detailed attendance information associated with each student (thus the output is now sorted in the order of email ids, alphabetically and has one record per student).

Below is an example format of the output (only parts of it are shown for brevity).

cat /data/attendance.csv

User Email,Name (Original Name),A,B,C,D,E,F,G,H,I,Attendance (Percentage)
adeline.larsen@mail.mcgill.ca,Adeline Larsen,62,57,0,0,45,51,58,60,60,77.78
...
melody.cohen@mail.mcgill.ca,Melody Cohen,65,58,57,64,43,50,53,60,59,88.89
...

The output contains a header followed by actual attendance information (ordered by the student email ids). Those records contain the email and name of a student, followed by the duration (nine columns) associated with each of the labs A through I. If a student does not have a zoom record for a particular lab (in the above example, Adeline did not attend labs C and D), then it should be recorded as 0 duration. The last column is the calculated attendance. For this purpose, you count the number of labs where the student spent 45 minutes or more in the speci c lab (in total) and calculate the percentage. In the above example, Melody spent only 43 minutes in lab E, and therefore, her attendance would be 8 out of 9, calculated as 88.89 percentage.

Your program should follow the above output formatting style and keep decimal points of attendance percentage to two places (i.e. its values would be from 0.00 through 100.00). Existing output les must be overwritten.

    8. labdata.c will contain the main of your program. It will call addZoomRecord and generateAttendance as needed (either directly or through other functions in labdata.c). Keep in mind that this may require you to put some additional information into zoomrecs.h to make it work. You have to gure that out based on modular programming concepts discussed in class.

    9. The more ner details of rest of the program logic is up to you - including the arguments to be passed to these functions, adding other (helper) functions if required, etc. But the high-level design described above should be maintained and the attendance information must be maintained and processed using the linked list. Duplicating the attendance data from linked list to a new array, etc., is not acceptable. You have to demonstrate your ability to work with linked lists. Not following this would result in (-3 points deduction).

    10. You are not required to perform any explicit error checks in the C program or implement other capabilities not explicitly stated. However, it is highly recommended to implement some of the commonly used checks to help reduce your development and debugging e ort.

    11. Ensure that your program frees up any dynamic memory allocated to it once it is done with its use and before terminating the program. (-2 points)

    12. (7 Points) For producing the correct output.

    13. (-3 Points) If the program crashes at any point (only valid inputs will be used for testing).

    14. (1.5 Points) Make sure that your source code is well commented and follows the basic modular programming principles covered in class.

    15. (1.5 Points) Write a proper makefile for your C program. Make sure that it compiles any parts as needed and only when it is actually impacted in the event of source code changes, following the principles discussed in class. TAs will compile your program by executing the command make in the directory that has your source code and makefile. If it does not build your executable program, no points will be given for this exercise. Points are also deducted for any uncessary compilation steps (defeats the purpose of a makefile).






Exercise 4 - GDB. (4 Points)

This is a theoretical exercise. The response to this exercise must be turned in as gdb.txt as a combination of textual explanation (of what you are doing and why) and gdb commands. Your responses must be based on your C program.

    1. Now imagine that your C program crashed (abort) while executing. You suspect that it is crashing inside the function generateAttendance. Describe what gdb commands (from compiling to executing the program) will you use, so that gdb will automatically stop execution at the beginning of the said function (using the most minimal steps to get to there).

    2. At this point, since you are not sure which part of the code is causing the issue, describe the sequence of commands that you will be executing. Continue the above investigation till you reach the part of your code where you are accessing the rst node of the linked list (as passed or accessed by the generateAttendance or its helper functions) for the rst time. Print the address pointed by that particular variable.

You may end the gdb session at this time. (We will imagine the issue was that somehow that pointer was NULL which your code was not handling correctly. But you do not have to modify your working code to make it crash.)

Exercise 5 - A Shell Script to Convert CSV to HTML (3 Points)

Now that we have the attendance information in the form of a CSV le, we can generate other interesting visual formats of this information. In this exercise, you will convert the CSV into an HTML format using Unix commands.

1. Write a shell script csv2html.sh that accepts two    lenames as arguments.


 $ csv2html.sh /data/attendance.csv /data/attendance.html

The rst argument is a CSV le in the format produced at the C program’s output that contains attendance information. The second argument is the lename for the output html version. The script should work ne with both absolute and relative paths.

    2. You MUST use only sed (additionally may use echo, if you need to) command to generate the output html contents. Violation of this will result in 0 points for this exercise.

    3. The conversion can be easily done, once you understand the simple structure of HTML. Below is an example format (truncated for brevity).


    • cat attendance.html
<TABLE>
<TR><TD>User Email</TD>...<TD>A</TD>...<TD>I</TD> <TD>Attendance (Percentage)</TD></TR>
...
<TR><TD>adeline.larsen@mail.mcgill.ca</TD>...<TD>62</TD>...<TD>77.78</TD></TR>
...
<TR><TD>melody.cohen@mail.mcgill.ca</TD>...<TD>65</TD>...<TD>88.89</TD></TR>
...
</TABLE>

Basically the entire contents is enclosed between the HTML tags <TABLE> and </TABLE>. Further, each row (line/record from CSV) is enclosed in <TR> and </TR> tags. Finally, each value themselves is enclosed between <TD> and </TD> tags.

If you copy the attendance.html le to your computer and open it in your computer’s browser, it will look something like this (truncated).









6



















There might be some minor visual di erences based on your browser types and versions, but that is ok.

    4. If you want to experiment with HTML table tags, here is a resouce that can be helpful.

    5. There is no requirement for the shell script to perform any error checks, but it is once again recommended to include some to help you during development and debugging.

CHECKLIST: WHAT TO HAND IN

    • git.log

    • fixformat.sh

    • labapp.c zoomrecs.h zoomrecs.c makefile

    • gdb.txt

    • csv2html.sh

You may create an archive of the le to upload, if needed. If all of your code is in the directory FinalProj, you can archive them in the following manner.

 $ tar -zcvf FinalProj.tar.gz FinalProj

Please download your submitted les to double check if they are good. Submissions that are corrupted cannot be unfortunately graded. Forgetting to submit required les will not get another chance.

You do not have to turn in the executables and object les associated with your C program. Any such les will be ignored and TAs will compile your C program on their own as indicated in the problem descriptions above.

ASSUMPTIONS

    • For the C program, you can assume that each line in the CSV can be easily read into a char array of size 200.

    • You can assume that the given C struct is su cent to store any values that you may see in the C program’s input.

    • You can assume that the names of the elds in the headers and their order remains the same and that no empty les will be used. All data will be ASCII (what we have seen so far in class) no Unicode, etc.

    • Alphabets in email ids follow lowercase letters and emails can additionally contain only period(.) and at sign (@).

    • You do not have to look for \bad data" in the attendance records.

    • You need not perform any explict error checks for the exercises other than those explicitly stated. However, your program must not crash/fail/error for valid inputs.


7
HINTS

You will nd that the shell scripting techniques learned for mini3 and the C programming logic that you implemented for mini5 can be used to make your project development e ort easier.

RESTRICTIONS

    1. None of your programs, scripts should not create any other intermediate (temporary use) les. They should only create the nal output le. Violating this can result in additional deductions for respective exercises.

    2. All of your scripts, programs should execute under a minute (in-total) for an estimate of 100 zoom records. (For some perspective, a naive implementation will execute under a couple of seconds).

    3. If your program crashes, goes into in nite loop, etc., TAs may not execute remaining test cases.

    4. Other individual restrictions are stated with each exercise. There is no further constraints.

TESTING

    • We recommend you start testing by creating only a couple of directories (say lab1, lab2) and a couple of les (say Lab-A, lab-C) and with just 1-2 students to begin with. Keep in mind that your objective should be about variety of data in testing and not necessarily its volume. For example if you test with 100 records that are already in sorted order, you may not realize that your C program’s sort logic is not working. If you plan smart, you can test e ectively with just a few records that you create.

    • Once you get a hang of it, you can try out more number of students. Remember! debugging your logic for corectness using a large amount of data will be very di cult.

    • TAs will test using their own scripts, but it will follow the assumptions that have been setforth in the project description.

QUESTIONS?

Please use piazza. However, before posting your question, use the search functionality to check if it has been already discussed. You should also look under \Final Project general clari cations" pinned post to check if a popular question has been already included there. They will not get individual responses again.

This is your nal project and you are expected to be able to debug your program’s logical errors and other issues using the tools and techniques taught in class. TAs will not help you do your nal project.

If your program crashes, use gdb to debug your program.

Responses are limited to clari cations on the project descriptions and expectations.
























8

More products