Homework #4 Printer Spooler Solution

Starting from:

~~$35~~

$29

Home

Introduction

The goal of this assignment is to become familiar with low-level Unix/POSIX system

calls related to processes, signal handling, files, and I/O redirection.

You will implement a printer spooler program, called `imprimer`, that accepts user

requests to queue files for printing, cancel printing requests, pause and resume

print jobs, show the status of printers and print jobs, and set up pipelines to

convert queued files of one type to the type of file accepted by an available printer.

### Takeaways

After completing this assignment, you should:

* Understand process execution: forking, executing, and reaping.

* Understand signal handling.

* Understand the use of "dup" to perform I/O redirection.

* Have a more advanced understanding of Unix commands and the command line.

* Have gained experience with C libraries and system calls.

* Have enhanced your C programming abilities.

## Hints and Tips

* We **strongly recommend** that you check the return codes of **all** system calls

and library functions. This will help you catch errors.

* **BEAT UP YOUR OWN CODE!** Use a "monkey at a typewriter" approach to testing it

and make sure that no sequence of operations, no matter how ridiculous it may

seem, can crash the program.

* Your code should **NEVER** crash, and we will deduct points every time your

program crashes during grading. Especially make sure that you have avoided

race conditions involving process termination and reaping that might result

in "flaky" behavior. If you notice odd behavior you don't understand:

**INVESTIGATE**.

* You should use the `debug` macro provided to you in the base code.

That way, when your program is compiled without `-DDEBUG`, all of your debugging

output will vanish, preventing you from losing points due to superfluous output.

:nerd: When writing your program, try to comment as much as possible and stay

consistent with code formatting. Keep your code organized, and don't be afraid

to introduce new source files if/when appropriate.

### Reading Man Pages

This assignment will involve the use of many system calls and library functions

that you probably haven't used before.

As such, it is imperative that you become comfortable looking up function

specifications using the `man` command.

The `man` command stands for "manual" and takes the name of a function or command

(programs) as an argument.

For example, if I didn't know how the `fork(2)` system call worked, I would type

`man fork` into my terminal.

This would bring up the manual for the `fork(2)` system call.

:nerd: Navigating through a man page once it is open can be weird if you're not

familiar with these types of applications.

To scroll up and down, you simply use the **up arrow key** and **down arrow key**

or **j** and **k**, respectively.

To exit the page, simply type **q**.

That having been said, long `man` pages may look like a wall of text.

So it's useful to be able to search through a page.

This can be done by typing the **/** key, followed by your search phrase,

and then hitting **enter**.

Note that man pages are displayed with a program known as `less`.

For more information about navigating the `man` pages with `less`,

run `man less` in your terminal.

Now, you may have noticed the `2` in `fork(2)`.

This indicates the section in which the `man` page for `fork(2)` resides.

Here is a list of the `man` page sections and what they are for.

| Section | Contents |

| ----------------:|:--------------------------------------- |

| 1 | User Commands (Programs) |

| 2 | System Calls |

| 3 | C Library Functions |

| 4 | Devices and Special Files |

| 5 | File Formats and Conventions |

| 6 | Games et. al |

| 7 | Miscellanea |

| 8 | System Administration Tools and Daemons |

From the table above, we can see that `fork(2)` belongs to the system call section

of the `man` pages.

This is important because there are functions like `printf` which have multiple

entries in different sections of the `man` pages.

If you type `man printf` into your terminal, the `man` program will start looking

for that name starting from section 1.

If it can't find it, it'll go to section 2, then section 3 and so on.

However, there is actually a Bash user command called `printf`, so instead of getting

the `man` page for the `printf(3)` function which is located in `stdio.h`,

we get the `man` page for the Bash user command `printf(1)`.

If you specifically wanted the function from section 3 of the `man` pages,

you would enter `man 3 printf` into your terminal.

:scream: Remember this: **`man` pages are your bread and butter**.

Without them, you will have a very difficult time with this assignment.

## Getting Started

Fetch and merge the base code for `hw4` as described in `hw0`.

You can find it at this link: https://gitlab02.cs.stonybrook.edu/cse320/hw4

**NOTE:** For this assignment, you need to run the following command in order

for your Makefile to work:

```sh

$ sudo apt-get install libreadline-dev

```

The `sudo` password for your VM is `cse320` unless you changed it.

The above command installs the GNU `readline` library, which is a software library

that provides line-editing and history capabilites for interactive programs with a

command-line interface. It allows users to move the cursor, search the command history,

control a kill ring (which is just a more flexible version of a copy/paste clipboard)

and use tab completion on a text terminal. We highly recommend that you use it for

this assignment.

Here is the structure of the base code:

<pre

.

├── .gitlab-ci.yml

└── hw4

├── include

│ ├── debug.h

│ └── imprimer.h

├── lib

│ └── imp_util.o

├── Makefile

├── rsrc

│ └── imprimer.cmd

├── src

│ └── main.c

├── tests

└── util

├── printer

├── show_printers.sh

└── stop_printers.sh

</pre

If you run `make`, the code should compile correctly, resulting in an

executable `bin/imprimer`. If you run this program, it doesn't do very

much, because there is very little code -- you have to write it!

## `Imprimer`: Functional Specification

### Command-Line Interface

When started, `imprimer` should present the user with a command-line

interface with the following prompt

```sh

imp

```

Typing a blank line should should simply cause the prompt to be repeated,

without any other printout or action by the program.

Non-blank lines are interpreted as commands to be executed.

`Imprimer` commands have a simple syntax, in which each command consists

of a sequence of "words", which contain no whitespace characters,

separated by sequences of one or more whitespace characters.

The first word of each command is a keyword that names the command.

Any remaining words are the arguments to the command.

`Imprimer` should understand the following commands, with arguments as

indicated.

Square brackets are not part of the arguments; they merely identify arguments

that are optional.

* Miscellaneous commands

* `help`

* `quit`

* Configuration commands

* `type` *file_type*

* `printer` *printer_name* *file_type*

* `conversion` *file_type1* *file_type2* *conversion_program* [*arg1* *arg2* ...]

* Informational commands

* `printers`

* `jobs`

* Spooling commands

* `print` *file_name* [printer1 printer2 ...]

* `cancel` *job_number*

* `pause` *job_number*

* `resume` *job_number*

* `disable` *printer_name*

* `enable` *printer_name*

The `help` command takes no arguments, and it responds by printing a message

that lists all of the types of commands understood by the program.

The `quit` command takes no arguments and causes execution to terminate.

The `type` command declares *file_type* to be a file type to be supported

by the program. Possible examples (but not an exhaustive list) of file types

are: `pdf` (Adobe PDF), `ps` (Adobe Postscript), `txt` (text), `png` (PNG image files),

*etc*. A file will be presumed to be of a particular type when it has an extension

that matches that type. For example, `foo.txt` will be presumed to be a

text file, if `txt` has previously been declared using the `type` command.

Files whose names do not having an extension that matches a declared type

are considered of unknown type and are to be rejected if an attempt is made

to spool them for printing.

Essentially any identifier can be used as a file type -- they may (but aren't

required to) correspond to "known" file types that have standards, are supported

by other programs, *etc*.

The `printer` command declares the existence of a printer named *printer_name*,

which is capable of printing files of type *file_type*. The *printer_name*

is just an identifier, such as `Alice`. Each printer is only capable of printing

files of the (one) type that has been declared for it, and your program should take

care not to send a printer the wrong type of file.

The `conversion` command declares that files of type *file_type1* can be

converted into *file_type2* by running program *conversion_program* with any

arguments that have been indicated. It is assumed that `conversion_program` reads

input of type *file_type1* from the standard input and writes output of type

*file_type2* to the standard output, so that it is suitable for use in a pipeline

consisting of possibly several such programs. For example, on your Linux Mint VM:

* The command `pdf2ps - -` can be used to convert PDF read from the standard input

to Postscript on the standard output.

* The command `pbmtext` can be used to convert text read from the standard input

to a Portable Bitmap (pbm) file on the standard output.

* The command `pbmtoascii` can be used to convert a Portable Bitmap (pbm) file read

from the standard input to an ASCII graphics (i.e. text) file on the standard output.

* The command `pbmtog3` can be used to convert a Portable Bitmap (pbm) file read

from the standard input to a Group 3 FAX file (g3).

There are many others: some of them work well together and others do not.

For many of these commands there are also corresponding commands that convert formats

in the reverse direction.

The `printers` command prints a report on the current status of the declared printers,

one printer per line. For example:

```

imp printers

PRINTER, 0, alice, ps, disabled, idle

PRINTER, 1, bob, pcl, disabled, idle

```

The `jobs` command prints a similar status report for the print jobs that have

been queued. For example:

```

imp jobs

JOB, 0, 22 Oct 2018 16:08:11, pdf, queued, 22 Oct 2018 16:08:11, 0, , foo.pdf, ffffffff

JOB, 1, 22 Oct 2018 16:08:16, ps, queued, 22 Oct 2018 16:08:16, 0, , bar.ps, ffffffff

JOB, 2, 22 Oct 2018 16:08:34, txt, queued, 22 Oct 2018 16:08:34, 0, , mumble.txt, ffffffff

```

The formats of these status reports will be generated by functions that have been

provided for you, as discussed in more detail below. You **must** use the provided

functions, which will make it easier for us to automate some of the testing of

your program.

The `print` command sets up a job for printing *file_name*.

The specified file name must have an extension that identifies it as one of the

file types that have previously been declared with the `type` command.

If optional printer names are specified, then these printers must previously

have been declared using the `printer` command, and they define the set of

*eligible printers* for this job. Only a printer in the set of eligible printers

for a job should be used for printing that jobs. Moreover, an eligible printer

can only be used to print a job if there is a way to convert the file in the

job to the type that can be printed by that printer.

If no printer name is specified in the `print` command, then any declared

printer is an eligible printer.

The `cancel` command cancels an existing job. If the job is currently being

processed, then any processes in the conversion pipeline for that job

are terminated (by sending a `SIGTERM` signal to their process group).

The `pause` command pauses a job that is currently being processed.

Processes in the conversion pipeline for that job are stopped

(by sending a `SIGSTOP` signal to their process group).

The `resume` command resumes a job that was previously paused.

Processes in the conversion pipeline for that job are continued

(by sending a `SIGCONT` signal to their process group).

The `disable` command sets the state of a specified printer to "disabled".

This does not affect the status of any job currently being processed

by that printer, but a disabled printer is not eligible to accept any

further jobs until it has been re-enabled using the `enable` commnd.

The `enable` command sets the state of a specified printer to "enabled".

When a printer becomes enabled, if there is a pending job that can now be

processed by the newly enabled printer, then processing is immediately

started for one such job.

### Program Output

Your program **must** produce output in the following situations, which do

not necessarily occur in direct response to a user command:

* Whenever the status of a printer has changed. In this case, the output

must consist of a status line showing the new status of that printer,

formatted using the function (`imp_format_printer_status()`) we provide

for this purpose, as described for the `printers` command above.

* Whenever the status of a job has changed. In this case, the output

must consist of a status line showing the new status of that job,

formatted using the function (`imp_format_job_status()`) we provide for

this purpose, as described for the `jobs` command above.

* Whenever an error occurs while executing a user command. In this case,

the output must consist of a single line containing an error message that

has been formatted using the function (`imp_format_error_message()`)

that we provide for this purpose.

Your program is permitted to emit output in addition to that specified above,

but any such output must occur on a line that does **not** start with

"`PRINTER`", "`JOB`", `ERROR`, or "`imp`, so that we can filter it out if we need to.

### Batch Mode

The normal mode of operation of `imprimer` is as an interactive application.

However, it can also be run in batch mode, in which it reads commands

from a command file. If `imprimer` is started as follows:

```sh

$ imprimer -i command_file

```

then it begins by reading and executing commands from `command_file` until EOF,

at which point it presents the normal prompt and executes commands

interactively. Normally this feature would be used to cause configuration

commands (declarations of types, printers, and conversions) to be read from

a command file, rather than typed each time. If a `quit` command appears

in the command file, then the program terminates without entering interactive

mode. This can be used to run a series of commands completely automatically

without user intervention.

If `imprimer` is started with the "`-o` *output_file*" option, then any output

it produces that would normally appear on the terminal is to be redirected instead

to the specified output file.

### Reading Input

If the program is run in interactive mode (the default), then it should use

the `readline()` function to read commands from the user.

If the program is run in batch mode, then `readline` cannot be used, so in this

case the program will have to read commands using either the standard I/O library

or low-level Unix I/O.

### Processing Print Jobs

The purpose of `imprimer` is to process print jobs that are queued by the user.

Each time there is a change in status of a job or printer as a result of a user command

or the completion of a job being processed, `imprimer` must scan the set of queued jobs

to see if there are any that can now be processed, and if so, start them.

In order for a job to be processed, there must exist a printer that is enabled and

not busy, the printer must be in the `eligible_printers` set for that job,

and there must be a way to convert the type of file in the job to the type

of file the printer is capable of printing. If these conditions hold, then the

job status is set to `RUNNING`, the chosen printer is set to "busy" and the

`chosen_printer` field of the `JOB` structure is set to point to the `PRINTER`

that has been selected. A group of processes called *conversion pipeline*

is set up to run a series of programs that will convert the type of file in the job

to the type of file that the printer can print. This is described further below.

A job will exist at any given time in one of various states, the possibilities

for which are defined by the `JOB_STATUS` enum in `imprimer.h`.

These states and their meanings are:

* `QUEUED` -- The job has been created and is ready for processing.

A job will persist in this state only as long as there are no printers in

the set of eligible printers for that job that can be used to print the job.

As soon as an eligible printer (of an appropriate type) becomes available,

the job will transition to the `RUNNING` state.

* `RUNNING` -- An eligible printer of an appropriate type has been chosen for

the job and a conversion pipeline has been created to convert the file in the

job to the type of file that the printer is capable of printing.

The chosen printer must be among the printers in the `eligible_printers` set

for that job. For a job to be started on a printer, the printer must be "enabled"

and "not busy". The printer status is changed to "busy", and it stays that

way as long as the job is `RUNNING`.

* `PAUSED` -- A job that was previously `RUNNING` has temporarily been stopped

by sending a `SIGSTOP` signal to the process group of the processes in the conversion

pipeline. A job in the `PAUSED` state will remain in that state until a `resume`

command has been issued by the user. This will cause a `SIGCONT` signal to be

sent to the process group of the conversion pipeline.

:scream: The state of a job should **not** be changed immediately when the

user issues a `pause` command. Instead, the `SIGSTOP` signal should first be

sent and the state of the job changed from `RUNNING` to `PAUSED` only when a

`SIGCHLD` signal has subsequently been received and a call to the `waitpid()`

function returns showing `WIFSTOPPED` true of the process status.

Similarly, the state of a job should not be changed immediately when the

user issues a `resume` command, but only once a `SIGCHLD` signal has been

received and a subsequent call to `waitpid()` returns showing `WIFCONTINUED`

true of the process status.

* `COMPLETED` -- A job enters this state from the `RUNNING` state once processing

has completed and the processes in the conversion pipeline have terminated normally.

Once in the `COMPLETED` state, a job will remain in the queue so that its status

can be inspected, but it will be ineligible for further processing.

A job will remain in the queue until just after the execution of the first user command

that is issued after the job has been completed for one minute.

At that point the job will be deleted from the queue and freed.

* `ABORTED` -- A job enters this state from the `RUNNING` state if one or more

processes in the job pipeline terminate abnormally. Once having entered the `ABORTED`

state, a job is treated similarly to a `COMPLETED` job as described above.

The `imprimer` program must install a `SIGCHLD` handler so that it can be notified

immediately upon completion of a job being processed. The handler must appropriately

update the job and printer status information and start any further jobs in the

queue that can now be processed by virtue of the printer having become available.

:nerd: Note that you will need to use `sigprocmask()` to block signals at appropriate times,

to avoid races between the handler and the main program, the occurrence of which which will

result in indeterminate behavior.

Each time the status of a job or printer changes, your program must immediately

print a corresponding status line (created using the `imp_format_job_status()`

or `imp_format_printer_status()` functions). This is so that so that it is evident

(to those grading your program) what and when state changes have occurred.

### Conversion Pipelines

In order to determine whether a particular printer can be used to service a

particular job, it will be necessary to determine whether there is a sequence

of conversions that can be used to transform the type of the file in the job

to the type of file the printer can print. To determine this, you will need

to maintain a suitable data structure (*e.g.* a matrix) to record the information

supplied by the user in the form of `conversion` commands, and you will need

to use a suitable algorithm (e.g. breadth-first search) to search for a path

of conversions between the two file types. If it exists, then the path of

conversions (and the associated conversion commands) forms the basis for setting

up the *conversion pipeline* to process the job.

Creation of a conversion pipeline should be begun by the main program forking

a single process to serve as the "master" process for the pipeline. This process

should use `setpgid()` to set its process group ID to its own process ID.

The master process will then fork one child process for each link in the

conversion path between the type of the file in the job and the type of file

that the chosen printer can print. Redirection should be used so that

the standard input of the first process in the pipeline is the file to be printed

and the standard output of the last process in the pipeline is the chosen

printer (for which a file descriptor has been obtained using `imp_connect_to_printer()`).

In addition, the `pipe()` and `dup()` (or `dup2()`) system calls should be used

to arrange to connect the standard output of each intermediate process in the

pipeline to the standard input of the next process.

Each process in the pipeline will execute (using `execve()`) one of the conversion

commands (previously declared by the user using the `conversion` command) to convert

the file read on its standard input to the type required by the next process in the

pipeline.

:nerd: It is possible that the type of the queued file is the same as the

type of file the printer can print. In this case, no conversion is required,

and the conversion program will consist of the master process and a single

child process, which should execute the program `/bin/cat` with no arguments.

The master process of a conversion pipeline is used to simplify the interaction

of the conversion pipeline with the main process. Since the master process creates

its own process group before forking the child processes, all the child processes

will exist in that process group. The processes in the pipeline can therefore

be paused and resumed by using `killpg()` to send a `SIGSTOP` or `SIGCONT` to

that process group. Only the master process is a child of the main process,

so the main process only has to keep track of the process ID for the master process

of each conversion pipeline that it starts.

The master process of a conversion pipeline will need to keep track of its child

processes, and to use `waitpid()` to reap them and collect their exit status.

If any child process terminates by a signal or with a nonzero exit status,

then the conversion pipeline will be deemed to have failed and the master process

should exit with a nonzero exit status.

The main process should interpret the nonzero exit status as an indication that

the job has failed, and it should set the job to the `ABORTED` state.

If all of the child processes in a conversion pipeline terminate normally with

zero exit status, then the master process should also terminate normally with

zero exit status. The main process should interpret this situation as an indication

that the job has succeeded, and it should set the job the the `COMPLETED` state.

**Important:** You **must** create the processes in a conversion pipeline using

calls to `fork()` and `execve()`. You **must not** use the `system()` function,

nor use any form of shell in order to create the pipeline, as the purpose of

the assignment is to giving you experience with using the system calls involved

in doing this.

## Provided Components

### The `imprimer.h` Header File

The `imprimer.h` header file that we have provided defines function prototypes

for the functions you are to use to format output for your program and to make

connections to printers. It also contains definitions of some constants and data

types related to these functions.

:scream: **Do not make any changes to `imprimer.h`. It will be replaced

during grading, and if you change it, you will get a zero!**

### The `imp_util.o` Library

We have provided you with an object file `imp_util.o` (in the `lib` directory),

which will be automatically linked with your program. This contains several functions,

whose prototypes are given in the `imprimer.h` header file, which you should use

as follows:

* char *imp_format_printer_status(PRINTER *printer, char *buf, size_t size);

* char *imp_format_job_status(JOB *job, char *buf, size_t size);

* char *imp_format_error_message(char *msg);

You **must** use these functions to format required output by your program.

The reason for this is so that everyone's program produces output in a uniform format

that we might have a fighting chance to process automatically.

The output looks a bit odd, but you will note that it is in comma-separated-value

(CSV) format that can be readily parsed.

These functions take a buffer that you supply, along with its size, and

they return a pointer to that same buffer.

* int imp_connect_to_printer(PRINTER *printer);

This is the function you **must** use to connect to a printer. If successful,

it returns a file descriptor to be used to send data to the printer;

if unsuccessful, -1 is returned. If the printer is not currently "up",

then it will be started (see about the `printer` program below).

In order to interface with the above functions, the header file `imprimer.h` defines

structure types `PRINTER` and `JOB`. You *must* pass in instances of these structures

that have all fields properly initialized. The meaning of each of the fields is

documented in the comments in the `imprimer.h` file. Each of these structures also

has an additional `other` field, which can be used to point to arbitrary information

of your own choosing should you have a need to do so. The functions above ignore the

value of this field, so there is no harm if you don't initialize it.

### The `printer` Program

The `printer` program we have provided (in the `util` directory) simulates a printer

that you can connect to and send data to. It doesn't actually "print" anything,

but it does log any files you send to it in `spool` directory. It also maintains

a debug log in that directory, in case it is necessary to get some idea of what the

printer has been doing.

A printer is automatically started when you try to connect to it using the

`imp_connect_to_printer()` function, if it is not already up.

You can also start a printer "manually" by a command of the following form:

```sh

$ util/printer [-d] [-f] PRINTER_NAME FILE_TYPE

```

This starts a printer with name `PRINTER_NAME`, which is capable of printing files of

type `FILE_TYPE`. Each printer that is started must have a unique name; if you try

to start a second printer with the same name as an existing printer, the second

command will fail. Once started, printers stay "up" until they are explicitly stopped,

You can stop all printers using the command `make stop_printers`.

The command `make show_printers` can be used to show you the printers that are currently up.

The optional `-d` and `-f` arguments to the `printer` command are used to cause the

printer to exhibit some random behavior. If `-d` is specified, then random delays

might occur during "printing". If `-f` is specified, then the printer will be "flaky",

which means that it might disconnect at random times, causing the conversion pipeline

to fail. The `imp_connect_to_printer()` function has a `flags` argument that can

also be used to specify these flags. The flags only take effect when the printer

is first started; once a printer is "up", the flags passed when connecting to it

have no further effect. The flags should be the bitwise "or" of one or more of

`PRINTER_NORMAL`, `PRINTER_DELAY`, and `PRINTER_FLAKY`.

### The `show_printers.sh` and `stop_printers.sh` Shell Scripts

The `util` directory contains shell scripts `show_printers.sh` and `stop_printers.sh`.

These are most easily invoked using `make show_printers` or `make stop_printers`,

though they can also be run directly.

### The `imprimer.cmd` File

The `rsrc` directory contains a file `imprimer.cmd`. This is a sample

file that contains some `imprimer` configuration commands that can be read using

the `-i` option. For example:

### The `spool` Directory

The `spool` directory is created by `make` in order to store various files created

by the "printers". For example, if a printer is started with name `alice`, then

`spool/alice.log` will contain debug log information, `spool/alice.pid` will contain

the process ID of the printer process (for use by `stop_printers.sh`),

`spool/alice.sock` will be a "socket" that is used by `imp_connect_to_printer()`

to connect the printer. Also, each time a file is "printed" the data that was received

is stored in a separately named file in this directory.

The `spool` directory is not removed by a normal `make clean`.

To remove the `spool` directory and all its contents, you can use `make clean_spool`.

```sh

$ imprimer -i rsrc/imprimer.cmd

```

## Other Notes

* At some point after this assignment has initially been handed out, I will probably

make available either a list of "recommended" commands to use in a conversion pipeline,

or I will make available some "dummy" commands for testing purposes.

Not having these commands available should not stop you from getting started;

you can always test your program using `cat` as a "conversion" command.

* I am considering making things more interesting and realistic by adding an additional

`display` command to `imprimer` (similar to the `printer` command) together with a

corresponding`display` program (similar to the `printer` program). The purpose of this

would be to provide a way for files to be actually "printed" by running a command that

displays them graphically in a window. If I decide to do this, I will make an announcement

about it and update this document.

## Hand-in instructions

As usual, make sure your homework compiles before submitting.

Test it carefully to be sure that doesn't crash or exhibit "flaky" behavior

due to race conditions.

Use `valgrind` to check for memory errors and leaks.

Besides `--leak-check=full`, also use the option `--track-fds=yes`

to check whether your program is leaking file descriptors because

they haven't been properly closed.

You might also want to look into the `valgrind` `--trace-children` and related

options.

Submit your work using `git submit` as usual.

This homework's tag is: `hw4`.