$23.99
Multiprocessing and File Statistics
A Unix systems programmer has the ability to fork/exec binaries, to establish pipes for synchronous communication between the processes executing those binaries, and to use signals for asynchronous interaction between those processes. This permits her to “reuse” small utilities to produce more complex functions. This is particularly useful when there is no source available for the small utilities.
Components
This assignment requires that you build a five process system from three relatively small pro- grams.
The totalsize Program
The totalsize program expects that its standard input is a list of file names separated by “whites- pace” (see the isspace(3) manual page). For all regular files whose name are on its stdin, it computes the total size of all those files it can access. That total size is sent to its standard output as a nicely formatted integer (a string that looks like an integer, actually). For example, suppose a directory contains the following files (the -lu option requests a long directory listing with time of last access shown):
-rwxr-xr-x
1
kearns
wheel
24576
Feb
23
13:51
a.out
-rw-r--r--
2
kearns
wheel
612
Feb
21
13:51
fileinfo.c
-rw-r--r--
2
kearns
wheel
612
Feb
21
13:51
link.c
lrwxrwxrwx
1
kearns
wheel
10
Feb
23
14:20
symlink.c - fileinfo.c
% ls -lu total 27
Below we show the invocation of totalsize assuming that the above directory is the current working directory:
% echo ". .. a.out fileinfo.c link.c symlink.c" | ./totalsize
25188
This indicates that there are 25118 bytes of storage tied up in the regular files in the list. The directories . and .. are not regular files; fileinfo.c, link.c, and symlink.c all refer to the same regular file (the first two are hard links, and the last is a symbolic link).
If the UNITS environment variable has value "K" or "k" when totalsize is invoked, then the total size of the named regular files in kilobytes is printed as an integer (truncate, don’t round up) followed by the string “kB.”
% export UNITS=K
% echo ". .. a.out fileinfo.c link.c symlink.c" | ./totalsize
24kB
% export UNITS=bogus
% echo ". .. a.out fileinfo.c link.c symlink.c" | ./totalsize
25188
If the TSTALL environment variable looks like a positive non-zero integer, then totalsize should sleep for that many seconds before inputting a file name.
If the TMOM environment variable looks like a positive non-zero integer, then totalsize should treat the value as a pid and signal that pid with a SIGUSR1 after it has produced all of its output
The accessed Program
Like totalsize, the accessed program takes a list of file names on its standard input. It also takes a mandatory argument. The program must be invoked as
accessed num
where num is an integer. If num is positive, accessed outputs, on its standard output, those regular files to which it has access which have not been accessed for num days. For example, assuming that it is about 3 p.m. on February 23:
% echo ". .. a.out fileinfo.c link.c symlink.c" | ./accessed 1 link.c
% echo ". .. a.out fileinfo.c link.c symlink.c" | ./accessed 5
%
When there are multiple links to a file, any (single) link will do as the name of the file.
If num is negative, accessed outputs, on its standard output, those regular files to which it has access which have been accessed within num days. Again, assuming that it is about 3 p.m. on February 23:
% echo ". .. a.out fileinfo.c link.c symlink.c" | accessed -1 a.out
% echo ". .. a.out fileinfo.c link.c symlink.c" | accessed -5 a.out
link.c
A value of 0 for the integer argument is invalid. A value that is too large or too small (i.e., negative)
to be represented on the system is also invalid.
The Driver Program, report
The report program will be responsible for creating and interconnecting the other processes of this multiprocess computation.
• It takes a list of filenames on its stdin.
• It creates four child processes, two execute the totalsize binary and two the accessed
binary.
• Pipes must be established so that report feeds the file names read on its stdin onto the
stdin of both children running accessed.
• Pipes must be established to bind the stdout of each accessed process onto the stdin of an associated totalsize binary.
• Pipes must be established so that report can read the stdout of both totalsize children.
• The report program accepts a mandatory non-zero positive integer, num, as its first command line argument. The integer specifies a duration in days. The first accessed process will essentially be invoked as
accessed num
The second will essentially be invoked as
accessed -num
The net effect is that the first accessed process should output those files which have not been accessed in num days. The second should output those files which have been accessed in num days.
• It accepts an optional -k as a command line argument that causes it to display all size information in kilobytes.
• It accepts an optional -d as a command line option. If -d is asserted the next command line argument must be a positive non-zero integer that specifies the number of seconds that the totalsize processes sleep before inputting a file name.
• After writing the last file name onto its output pipes, it enters a non- terminating loop in which it sleeps for 1 second and outputs an asterisk on its standard output.
• Upon receipt of a SIGUSR1 signal, it exits the loop, reads the output of the first totalsize process, and outputs a nicely formatted message. It prints a separator line (perhaps hyphens or asterisks). It then outputs a message incorporating the stdout of the second totalsize process.
• On termination, no garbage files or orphan processes should be left on the system.
The following figure shows how a running system should appear.The heavy lines connecting the processes represent pipe-based communication.
file list
0
0
accessed
1
0 totalsize 1
report
output
1
0
accessed
1
0
totalsize
1
In the context of our example, here are some executions of the system:
% echo ". .. a.out fileinfo.c link.c symlink.c" | report 1 -d 2
******
A total of 612 bytes are in regular files not accessed for 1 days.
----------
A total of 24576 bytes are in regular files accessed within 1 days.
% echo ". .. a.out fileinfo.c link.c symlink.c" | report 5 -k
A total of 0kB are in regular files not accessed for 5 days.
----------
A total of 24kB are in regular files accessed within 5 days.
Note that the exact number of asterisks is roughly proportional to the delay (if any). 1
Notes
The totalsize and accessed programs must be able to run as stand-alone binaries. We will test them in stand-alone mode before testing the full-blown system of five processes.
The TA will look at your source code to make sure you follow the termination protocol as specified (i.e., that one of the totalsize processes signals report to pull it out of the asterisk-printing loop). You are well advised to make your source code neat and readable.
You must use the fork() system call and any of the variants of the exec() system call/library function that you choose. This implies that you may not use the system() and popen() library functions (or similar) in this assignment.
All information about a file can be obtained from the stat() system call. I will have a special mini-lecture on the use of stat() and computations that involve real time.
Due
The submission deadline is 11:59pm on Tuesday, February 28.
You must structure your system to have three source files, report.c, totalsize.c, and accessed.c. Follow these steps to submit:
tar zcvf system.tgz report.c totalsize.c accessed.c
This produces a gzipped (compressed) “tarball” consisting of your three source files.
2. gpg -r CSCI415 --sign -e system.tgz
This signs and encrypts your gzipped tarball so the TA can retrieve the submission.
3. mv system.tgz.gpg ~; chmod 644 ~/system.tgz.gpg
This moves your submission to your home directory and protects it so that it can be copied by the TA.