$29
In this assignment, you will write a command line utility to translate MIPS
machine code between binary and human-readable mnemonic form.
The goal of this homework is to familiarize yourself with C programming,
with a focus on input/output, strings in C, and the use of pointers.
You **MUST** write your helper functions in a file separate from `main.c`. The
`main.c` file **MUST ONLY** contain `#include`s, local `#define`s and the `main`
function. This is the only requirement for project structure. Beyond this, you
may have as many or as few additional `.c` files in the `src` directory as you
wish. Also, you may declare as many or as few headers as you wish. In this
document, we use `hw1.c` as our example file containing helper functions.
# Getting Started
Fetch base code for `hw1` as described in `hw0`. You can find it at this link:
[https://gitlab02.cs.stonybrook.edu/cse320/hw1](https://gitlab02.cs.stonybrook.edu/cse320/hw1).
Both repos will probably have a file named `.gitlab-ci.yml` with different contents.
Simply merging these files will cause a merge conflict. To avoid this, we will
merge the repos using a flag so that the `.gitlab-ci.yml` found in the `hw1`
repo will be the file that is preserved.
To merge, use this command:
```
git merge -m "Merging HW1_CODE" HW1_CODE/master --strategy-option theirs
```
Here is the structure of the base code:
<pre>
hw1
├── include
│ ├── const.h
│ ├── debug.h
│ ├── hw1.h
│ └── instruction.h
├── Makefile
├── rsrc
│ ├── bcond.asm
│ ├── bcond.bin
│ ├── examples.asm
│ ├── examples.bin
│ ├── jump.asm
│ ├── jump.bin
│ ├── matmult.asm
│ ├── matmult.bin
│ ├── typei.asm
│ ├── typei.bin
│ ├── typer.asm
│ └── typer.bin
├── src
│ ├── hw1.c
│ ├── instr_table.c
│ └── main.c
└── tests
└── hw1_tests.c
</pre>
> :nerd: Reference for pointers: [http://beej.us/guide/bgc/output/html/multipage/pointers.html](http://beej.us/guide/bgc/output/html/multipage/pointers.html]).
> :nerd: Reference for command line arguments: [http://beej.us/guide/bgc/output/html/multipage/morestuff.html#clargs](http://beej.us/guide/bgc/output/html/multipage/morestuff.html#clargs).
**Note**: All commands from here on are assumed to be run from the `hw1` directory.
## A Note about Program Output
What a program does and does not print is VERY important.
In the UNIX world stringing together programs with piping and scripting is
commonplace. Although combining programs in this way is extremely powerful, it
means that each program must not print extraneous output. For example, you would
expect `ls` to output a list of files in a directory and nothing else.
Similarly, your program must follow the specifications for normal operation.
One part of our grading of this assignment will be to check whether your program
produces EXACTLY the specified output. If your program produces output that deviates
from the specifications, even in a minor way, or if it produces extraneous output
that was not part of the specifications, it will adversely impact your grade
in a significant way, so pay close attention.
**Use the debug macro `debug` (described in the 320 reference document in the
Piazza resources section) for any other program output or messages you many need
while coding (e.g. debugging output).**
# Part 1: Program Operation and Argument Validation
In this part, you will write a function to validate the arguments passed to your
program via the command line. Your program will support the following flags:
- If no flags are provided, you will display the usage and return with an
`EXIT_FAILURE` return code
- If the `-h` flag is provided, you will display the usage for the program and
exit with an `EXIT_SUCCESS` return code
- If the `-a` flag is provided, you will perform text-to-binary conversion
(i.e. "assembly"), reading text from `stdin` and writing binary to `stdout`.
- If the `-d` flag is provided, you will perform binary-to-text conversion
(i.e. "disassembly"), reading binary from `stdin` and writing text to `stdout`.
> The `-a` and `-d` flags are not allowed to be used in combination with each
> other
> :nerd: `EXIT_SUCCESS` and `EXIT_FAILURE` are macros defined in `<stdlib.h>` which
> represent success and failure return codes respectively.
> :nerd: `stdin`, `stdout`, and `stderr` are special files that are opened upon
> execution for all programs and do not need to be reopened.
Some of these operations will also need other command line arguments which are
described in each part of the assignment. The two usages for this program are:
<pre>
usage: ./hw1 -h [any other number or type of arguments]
usage: bin/hw1 [-h] -a|-d [-b BASEADDR] [-e ENDIANNESS]
-a Assemble: convert mnemonics to binary code
-d Disassemble: convert binary code to mnemonics
Additional parameters: [-b BASEADDR] [-e ENDIANNESS]
-b BASEADDR is the starting memory address for the code
It must be a hexadecimal number of 8 digits or less
-e ENDIANNESS specifies the byte order of the binary code
It must be a single character:
b for big-endian, or
l for little-endian
-h Display this help menu.
</pre>
A valid invocation of the program implies that the following hold about
the command-line arguments:
- All positional arguments (`-a|-d`) come before any optional
arguments (`-b and -e`). The optional arguments may come in any order
after the positional ones.
- If the `-h` flag is provided, it is the first positional argument after
the program executable.
- If an option requires a parameter, the corresponding parameter must be provided
(e.g. `-e` must always be followed by an ENDIANNESS specification).
- If `-b` is given, the BASEADDR argument will be given as a hexadecimal
number in which in addition to the digits ('0'-'9) either upper-case letters
('A'-'F') or lower-case letters ('a'-'f') may be used, in any combination.
- If `-e` is given, then the ENDIANNESS argument will be a single word
(i.e. will have no whitespace).
> :scream: You may only use `argc` and `argv` for argument parsing and
> validation. Using any libraries that parse command line arguments (e.g.
> `getopt`) is prohibited.
> :scream: Any libraries that help you parse strings are prohibited as well
> (`string.h`, `ctype.h`, etc). *This is intentional and will help you
> practice parsing strings and manipulate pointers.*
> :scream: You **MAY NOT** use dynamic memory allocation in this assignment
> (i.e. `malloc`, `realloc`, `calloc`, `mmap`, etc)
For example, the following are a subset of the possible valid argument
combinations:
- `$ bin/hw1 -h ...`
- `$ bin/hw1 -a`
- `$ bin/hw1 -a -e b`
- `$ bin/hw1 -d -b D000d000 -e l`
Some examples of invalid orderings would be:
- `$ bin/hw1 -e b -d`
- `$ bin/hw1 -b D000d000 -a -e b`
> :scream: The `...` means that all arguments, if any, are to be ignored; e.g.
> the usage `bin/hw1 -h -a -b D00D000 -e b` is equivalent to `bin/hw1 -h`
**NOTE:** The makefile compiles the `hw1` executable into the `bin` folder.
Assume all commands in this doc are run from from the `hw1` directory of your
repo.
### **Required** Validate Arguments Function
In `const.h`, you will find the following function prototype (function
declaration) already declared for you. You **MUST** implement this function
as part of the assignment.
<pre>
/**
* @brief Validates command line arguments passed to the program.
* @details This function will validate all the arguments passed to the
* program, returning 1 if validation succeeds and 0 if validation fails.
* Upon successful return, the selected program options will be set in the
* global variable "global_options", where they will be accessible
* elsewhere in the program.
*
* @param argc The number of arguments passed to the program from the CLI.
* @param argv The argument strings passed to the program from the CLI.
* @return 1 if validation succeeds and 0 if validation fails.
* Refer to the homework document for the effects of this function on
* global variables.
* @modifies global variable "global_options" to contain a bitmap representing
* the selected options.
*/
int validargs(int argc, char **argv);
</pre>
> :scream: This function must be implemented as specified as it will be tested
> and graded independently. **It should always return -- the USAGE macro should
> never be called from validargs.**
The `validargs` function should return 0 if there is any form of failure.
This includes, but is not limited to:
- Invalid number of arguments (too few or too many)
- Invalid ordering of arguments
- A missing parameter to an option that requires one (e.g. `-e` with no
ENDIANNESS specification).
- Invalid base address (if one is specified). A base address is invalid
if it contains characters other than the digits ('0'-'9), upper-case
letters ('A'-'F'), and lower-case letters ('a'-'f'), if it is more than
8 digits in length, or if it is not a multiple of 4096
(i.e. the twelve least-significant bits of its value are not all zero).
- Invalid endianness (if one is specified). An endiannness is invalid
if either it does not consist of a single character or that single character
is not either 'b' or 'l'.
The `global_options` variable of type `unsigned int` is used to record the mode
of operation (i.e. assemble/disassemble) of the program, as well as any selected flags
and base address. This is done as follows:
- If the `-h` flag is specified, the least significant bit is 1
- The second least significant bit is 0 if `-a` is passed (i.e. the user wants
assembly mode) and 1 if `-d` is passed (i.e. the user wants disassembly mode)
- The third least signficant bit is 1 if `-e b` is passed (i.e. the user wants
big-endian byte ordering) and 0 otherwise.
- If the `-b` option was specified, then the base address is given by taking
the value of `global_options` and clearing the 12 least significant bits.
If the `-b` option was not specified, then the 20 most significant bits of
`global_options` should all be 0 (i.e. the default base address is 0).
If `validargs` returns 0 indicating failure, your program must print
`USAGE(program_name, return_code)` and return `EXIT_FAILURE`.
**Once again, `validargs` must always return, and therefore it must not
call the `USAGE(program_name, return_code)` macro itself.
That should be done in `main`.**
If `validargs` sets the least significant bit of `global_options` to 1
(i.e. the `-h` flag was passed), your program must print `USAGE(program_name,
return_code)` and return `EXIT_SUCCESS`.
> :nerd: The `USAGE(program_name, return_code)` macro is already defined for you
> in `const.h`.
If validargs returns 1 and the least significant bit of `global_options` is 0,
your program must perform assembly or disassembly accordingly and return
`EXIT_SUCCESS` upon successful completion, or `EXIT_FAILURE` in case of an error.
If `-b` is provided, you must check to confirm that the specified base address
is valid.
If `-e` is provided, you must check that the specified endianness is either
the single character `b` or the single character `l`.
> :nerd: Remember `EXIT_SUCCESS` and `EXIT_FAILURE` are defined in `<stdlib.h>`.
> Also note, `EXIT_SUCCESS` is 0 and `EXIT_FAILURE` is 1.
> :nerd: We suggest that you create functions for each of the operations defined
> in this document. Writing modular code will help you isolate and fix
> problems.
### Sample validargs Execution
The following are examples of `global_options` settings for given inputs.
Each input is a bash command that can be used to run the program.
In the examples, all don't care bits (bits 3-11, where the least significant
bit is numbered 0 and the most significant bit is numbered 31) have been set to 0.
- Input: `bin/hw1 -h`. Setting: 0x1 (`help` bit is set. All other bits are
don't cares.)
- Input: `bin/hw1 -d`. Setting: 0x2 (`disassemble` bit is set).
- Input: `bin/hw1 -d -e b`. Setting: 0x6 (`disassemble` and
`big endian` bits are set).
- Input: `bin/hw1 -d -e b -b BaB000`. Setting: 0xBAB006 (`disassemble` and
`big endian` bits are set, base address is 0xBAB000).
- Input: `bin/hw1 -e b -d -b BaB000`. Setting: 0x0. This is an error
case because the argument ordering is invalid (`-e` is before `-d`).
In this case `validargs` returns 0, leaving `global_options` unset.
# Part 2: MIPS Instruction Format
Presumably you learned something about the MIPS process and its instruction set
in CSE 220. If you need to, review the materials used for that course.
You might also find useful information via
[this link](https://en.wikipedia.org/wiki/MIPS_architecture#Instruction_formats) or
[this one](https://www.cs.cornell.edu/courses/cs3410/2008fa/MIPS_Vol2.pdf).
Below we summarize the information about the MIPS instruction format that will
be needed to do the assignment.
Each MIPS instruction consists of one 32-bit word. We will number the bits
from 0 (least significant bit) to 31 (most significant bit) and we will think
of bit 31 as being "leftmost". To indicate a particular bit field from
the instruction word we will use a notation like 31:26, which indicates
bits 31 down to 26; that is, the 6 "leftmost", or most significant bits.
In every MIPS instruction, bit field 31:26 is used as a 6-bit opcode.
Most instructions are directly identified by one of the 64 possible values
of this field, but as we will see there are some special cases.
There are three types of MIPS instructions: R, I, and J.
Instructions of type R take up to three registers as arguments.
Instructions of type I take up to two registers and a 16-bit immediate value
(obtained from the 16 least significant bits of the instruction word).
Instructions of type J take a jump target from the 26 least signficant
bits of the instruction word.
The MIPS processor has 32 registers, which means that it takes 5 bits to
specify a register.
The registers are specified by the contents of bit fields
25:21 (called `RS`), 20:16 (called `RT`), and 15:11 (called `RD`),
or, in some cases, bit field 10:6.
In the files `instruction.h` and `instr_table.c` you have been provided
with a set of tables that can be used to decode MIPS binary instruction words.
Rather than going through full details of the MIPS instruction format,
we will just go through the procedure for decoding an instruction using the
tables.
The type `Opcode` is an enumerated type that assigns to integer values
in the range 0 to 63 the names of MIPS instructions, and in addition defines
names for three additional values `SPECIAL` (64), `BCOND` (65), and `ILLEGL` (66).
`Opcode` values in the range 0 to 63 serve as indices into the
instruction table `instrTable`. Each entry in this table uniquely identifies
a particular type of MIPS instruction and provides further information about it.
Our first objective in decoding an instruction is to determine the proper
`Opcode` value (in the discussion below we refer to this as "the Opcode"),
thereby obtaining access to the proper entry from the instruction table.
The starting point for obtaining the Opcode is the value in bits 31:26
of the instruction word.
This value is used as an index into `opcodeTable` and the value (of type `Opcode`)
at that index in the table is retrieved.
- If the value obtained from `opcodeTable` is neither `SPECIAL` nor `BCOND`,
then it is the Opcode.
- If the value obtained from `opcodeTable` is `SPECIAL` (this occurs when the
value of bits 31:26 is 000000), then the value in bits 5:0 of the instruction
word is used as an index into the table `specialTable` to obtain the Opcode.
- If the value obtained from `opcodeTable` is `BCOND`, then the value in bits
20:16 is examined. If the value is 00000, 00001, 10000, or 10001,
then the Opcode is `OP_BLTZ`, `OP_BGEZ`, `OP_BLTZAL`, or `OP_BGEZAL`, respectively,
otherwise it is an error.
Having determined the Opcode, it is then used as an index into `instrTable`
and the corresponding `Instr_info` structure is retrieved.
What happens next depends on the value of the `type` field.
This value can be `NTYP` (which occurs in a few entries of the table that
do not correspond to actual instructions), `RTYP`, which indicates an
instruction of type R, `ITYP`, which indicates an instruction of type I,
and `JTYP`, which indicates an instruction of type J.
The next task is to determine the sources of the instruction arguments.
For this, the information in the `srcs` field of the `Instr_info` structure is used.
This field consists of an array of three values of type `Source`.
The first entry in this array specifies the source of the first instruction
argument, the second entry specifies the source of the second instruction
argument, and the third entry specifies the source of the third argument.
There are five possible source values: `RS`, `RT`, `RD`, `EXTRA`, and `NSRC`.
The value `RS` indicates that the argument source is the register specified
by the RS field of the instruction word.
Similarly, the values `RT` and `RD` the argument source is the register
specified by the RT or RD field of the instruction word, respectively.
The value `EXTRA` indicates that the argument value has to be decoded
from the instruction word in a way that depends on the particular type
of instruction.
The value `NSRC` is used as a place-holder value for instructions that take fewer
than three arguments.
For arguments with source `EXTRA`, the actual argument is determined as follows:
- If the Opcode is `OP_BREAK`, then the argument consists of the 20-bit
value in bits 25:6 of the instruction word.
- For instructions of type R, the argument consists of the 5-bit value
in bits 10:6 of the instruction word.
- For instructions of type I, the argument is obtained by extracting the 16-bit
value in bits 15:0, treating bit 15 as a sign bit,
and performing sign-extension to a 32-bit signed integer.
For non-branch instructions of type I (such as `ADDI`),
this 32-bit signed integer is the immediate argument to the instruction.
For the conditional branch instructions
`BEQ`, `BGEZ`, `BGEZ`, `BGEZAL`, `BGTZ`, `BLEZ`, `BLTZ`, `BLTZAL`, `BNE`,
the 32-bit signed integer value is further processed by shifting it left by two bits
(which amounts to multiplication by 4) and then treating it as a PC-relative branch
offset. It is added to the current value of the PC register (this will be the memory
address at which the instruction "lives", plus 4) to obtain an absolute address
which is the branch target.
- For instructions of type J, the argument is obtained by extracting the 26-bit
value in bits 25:0 of the instruction word and treating it as an unsigned integer.
This value is shifted left by two bits and then added to the value obtained from
the PC by zeroing the 28 least significant bits, to obtain an absolute address
that is the jump target. (As above, the PC value is given by the memory address
of the instruction, plus 4.)
### Example 1:
The instruction word is 0x00c72820, which when written in binary is:
> <pre>
> 0000 0000 1100 0111 0010 1000 0010 0000
> OOOO OOSS SSST TTTT DDDD D FF FFFF
> </pre>
The letters written underneath the bits indicate the various bit fields:
`O` for the opcode field in bits 31:26,
`S` for the RS field in bits 25:21,
`T` for the RT field in bits 20:16,
`D` for the RD field in bits 15:11, and
`F` for the function code in bits 5:0.
The value in bits 31:26 is 000000; i.e. 0.
Using this as an index into `opcodeTable` yields `SPECIAL`,
so it is then necessary to use the value in bits 5:0 as an index into `specialTable`.
This index is 100000, or 32 in decimal, and the entry at that index is `OP_ADD`.
The corresponding entry in `instrTable` indicates that the `ADD` instruction is
of type R, and that the three arguments are given by RD, RS, and RT.
The value of RD (in bits 15:11) is 00101 indicating that the first argument is
register 5.
The value of RS (in bits 25:21) is 00110 indicating that the second argument is
register 6.
The value of RT (in bits 20:16) is 00111 indicating that the third argument is
register 7.
So the mnemonic form of this instruction is `add $5,$6,$7`.
### Example 2:
The instruction word is 0x8cc50007, which when written in binary is:
> <pre>
> 1000 1100 1100 0101 0000 0000 0000 0111
> OOOO OOSS SSST TTTT XXXX XXXX XXXX XXXX
> </pre>
The value in bits 31:26 is 100011, or 35.
Using this as an index into `opcodeTable` yields OP_LW.
The corresponding entry from instrTable indicates that the
instruction is of type I, with first argument source RT,
second argument source EXTRA, and the third argument source RS.
RT is 00101 so the first argument is register 5.
RS is 00110 so the third argument is register 6.
The second argument is obtained from bits 15:0, which have the value 7.
So the mnemonic form of this instruction is `lw $5,7($6)`.
### Example 3:
The instruction word is 0x10effc1f, which when written in binary is:
> <pre>
> 0001 0000 1110 1111 1111 1100 0001 1111
> OOOO OOSS SSST TTTT XXXX XXXX XXXX XXXX
> </pre>
The value in bits 31:26 is 000100, or 4.
Using this as an index into `opcodeTable` yields OP_BEQ.
The corresponding entry from instrTable indicates that the
instruction is of type I, with first argument source RS,
second argument source RT, and the third argument source EXTRA.
RS is 00111 so the first argument is register 7.
RT is 01111 so the third argument is register 15.
The third argument is obtained from bits 15:0, which is `fc1f` in hex.
This 16-bit value is sign-extended to the 32-bit signed value
`fffffc1f`, which is then shifted two bits to obtain `ffff f07c`,
or -3972 in decimal. This is the PC-relative branch offset.
This offset is added to the current value of the PC
(i.e. the memory address of the instruction, plus 4) to obtain
the final absolute address that is the branch target.
Assuming the memory addess of this instruction is `1000` in hex,
or 4096 in decimal, the branch target is 4096 + 4 - 3972, or 128 in decimal.
So the mnemonic form of this instruction is `beq $7,$15,128`.
### Example 4:
The instruction word is 0x08000400, which when written in binary is:
> <pre>
> 0000 1000 0000 0000 0000 0100 0000 0000
> OOOO OOXX XXXX XXXX XXXX XXXX XXXX XXXX
> </pre>
The value in bits 31:26 is 000010, or 2.
Using this as an index into `opcodeTable` yields OP_J.
The value in bits 25:0 is 0000400, which is shifted left two bits to obtain
`00001000` in hex.
Assuming that the memory address of the instruction is `40000000` in hex,
the PC value at the time of execution would be `40000004`.
Clearing the 28 least-significant bits yields `40000000`, and adding this
to `00001000` yields `40001000` in hex.
So the mnemonic form of this instruction is `j 0x40001000`.
> :scream: The MIPS instruction set does not support jumps to addresses
> whose four most-significant bits differ from those of the current PC value.
> Consequently, an attempt to assemble a jump instruction (i.e. `J` or `JAL`)
> whose target address differs in its four most-significant bits from the base
> address supplied with `-b` should be treated as an error.
# Part 3: **Required** `encode` and `decode` functions
In order to provide some additional structure for you, as well as to make it
possible for us to perform additional unit tests on your program,
you are required to implement the two functions below as part of your program.
The prototypes for these functions are given in `const.h`.
Once again, you **MUST** implement these functions as part of the assignment,
as we will be testing them separately.
<pre>
/**
* @brief Computes the binary code for a MIPS machine instruction.
* @details This function takes a pointer to an Instruction structure
* that contains information defining a MIPS machine instruction and
* computes the binary code for that instruction. The code is returne
* in the "value" field of the Instruction structure.
*
* @param ip The Instruction structure containing information about the
* instruction, except for the "value" field.
* @param addr Address at which the instruction is to appear in memory.
* The address is used to compute the PC-relative offsets used in branch
* instructions.
* @return 1 if the instruction was successfully encoded, 0 otherwise.
* @modifies the "value" field of the Instruction structure to contain the
* binary code for the instruction.
*/
int encode(Instruction *ip, unsigned int addr);
</pre>
<pre>
/**
* @brief Decodes the binary code for a MIPS machine instruction.
* @details This function takes a pointer to an Instruction structure
* whose "value" field has been initialized to the binary code for
* MIPS machine instruction and it decodes the instruction to obtain
* details about the type of instruction and its arguments.
* The decoded information is returned by setting the other fields
* of the Instruction structure.
*
* @param ip The Instruction structure containing the binary code for
* a MIPS instruction in its "value" field.
* @param addr Address at which the instruction appears in memory.
* The address is used to compute absolute branch addresses from the
* the PC-relative offsets that occur in the instruction.
* @return 1 if the instruction was successfully decoded, 0 otherwise.
* @modifies the fields other than the "value" field to contain the
* decoded information about the instruction.
*/
int decode (Instruction *ip, unsigned int addr);
</pre>
These functions each take as an argument a pointer to a structure of type
`Instruction`, which is defined in `instruction.h`.
The `decode` function assumes that the `value` field has been set to the
binary code for a MIPS instruction, and it decodes this value to fill in
the other fields.
The `info` field should be set to a pointer to the appropriate
`Instr_info` structure obtained from `instrTable`.
The entries of the `regs` array should be set to the contents of the
RS, RT, and RD fields of the instruction word. (**Note:** these should
always be set even if the particular instruction does not use those fields.)
The `extra` field should be set to the "extra" argument decoded from the
instruction word. As this is done differently for each type of instruction,
this does not have to be set unless the instruction uses `EXTRA` as an
argument source.
The entries of the `args` field should be set to the final values of the
instruction arguments, as required in order print out the instruction in
mnemonic form (i.e. `args[0]` corresponds to the first `%` in the
`format` string, `args[1]` corresponds to the second `%`,
and `args[2]` to the third). If the instruction takes fewer than three
arguments, the unused entries (i.e. the ones with src `NSRC`) should be
set to 0.
The `encode` function does the inverse operation from `decode`:
it assumes that all the fields other than `value` have been set,
and it computes the binary code for the instruction and stores
it in the `value` field.
> :scream: You should not define additional tables to help you map
> instruction mnemonics to `Opcode` values for implementing
> `encode`. Instead, to perform this mapping you should use a linear
> search of the existing `instrTable`. You should use the `sscanf`
> function to match a format string from the `instrTable` against a
> mnemonic instruction read from the input. The mapping defined by
> the `specialTable` should be inverted using a similar linear scan
> approach.
The implementation of `validargs`, `encode`, and `decode` constitutes most
of the work involved in implementing the program. Once these have been written,
finishing the program should be easy.
One requirement we have yet to consider is the endianness option.
Recall that "endianness" refers to the order in which the bytes in a multi-byte
quantity are stored in memory or written to a file.
In "little-endian" byte order, the **least-significant** byte is stored at the
lowest-numbered memory address or written first to a file.
In "big-endian" byte order, the **most-significant** byte is stored or written
first.
The default mode of operation of your program should be to use little-endian
byte order for reading and writing binary MIPS code. However, if the `-e b` option
is specified, then big-endian ordering should be used instead.
# Part 4: Running the Completed Program
In either assembly or disassembly mode, the program reads from `stdin` and writes
to `stdout`. In assembly mode, since the input is text, it is possible to enter
to enter assembly code directly from the terminal:
> <pre>
> $ bin/hw1 -a
> add $5,$6,$7
> j 0x1000
> </pre>
**NOTE:** In the above example, the program encrypts one line at a time
and stops encrypting after it reads `^d` (control-d) from `stdin`. Entering `^d`
into a terminal in a UNIX system signals an `EOF` (end of file) to the program.
If you run the program in assembly mode this way, the binary output of the
program will also be sent to the terminal. This binary data will appear
as "garbage" in the output. To avoid this, the binary output should be redirected,
either to a file or else via a pipe to a program that can produce a printable
representation of it.
To redirect the output to a file `hw1.out`, you can use:
> <pre>
> $ bin/hw1 -a > hw1.out
> add $5,$6,$7
> j 0x1000
> $ echo $?
> 0
> </pre>
> :nerd: The `>` symbol tells the shell to perform "output redirection":
> the file `hw1.out` is created (or truncated if it already existed -- be careful!)
> and the output produced by the program is sent to that file instead
> of to the terminal.
> :nerd: `$?` is an environment variable in bash which holds the return code of
> the previous program run. In the above, the `echo` command is used to display
> the value of this variable.
The contents of `hw1.out` can then be viewed using the `od` ("octal dump") command:
> <pre>
> $ od -X hw1.out
> 0000000 00c72820 08000400
> 0000010
> </pre>
> :nerd: The `-X` flag instructs `od` to interpret the file as a sequence of 32-bit
> words, which are printed as 8-digit hexadecimal values. In this case, the file
> contains two such words: `00c72820` and `08000400`. The values in the first
> column indicate the offsets from the beginning of the file, specified as 7-digit
> octal (base 8) numbers.
Alternatively, the output of the program could be redirected via a "pipe" to
the `od` command, without using any file:
> <pre>
> $ bin/hw1 -a | od -X
> add $5,$6,$7
> j 0x1000
> 0000000 00c72820 08000400
> 0000010
> </pre>
> :nerd: In this case, you won't see the output produced by `od` until
> `^d` has been typed, because when the output of a program is redirected to
> a pipe the system assumes that the program is being run non-interactively,
> so for efficiency it buffers a larger amount of the output rather than
> emitting it a line at a time.
In disassembly mode, it is not very useful to read the input from the terminal,
since it would be very difficult to generate the necessary binary data using
the keyboard. Instead, the input should be redirected *from* a file:
> <pre>
> $ bin/hw1 -d < hw1.out
> add $5,$6,$7
> j 0x1000
> </pre>
Finally, a pipe can be used to assemble and disassemble in a single run.
This is one way to test whether your program is working properly:
> <pre>
> $ bin/hw1 -a | bin/hw1 -d
> add $5,$6,$7
> j 0x1000
> add $5,$6,$7
> j 0x1000
> </pre>
The output should be identical to the input if the program is working properly.
## Testing Your Program
In testing your program, it is useful to be able to compare two files
to see if they have the same content. The `diff` command (use `man diff`
to read the manual page) is useful for comparison of text files.
On the other hand, the `cmp` command can be used to perform a byte-by-byte
comparison of two files, regardless of their content:
> <pre>
> $ cmp file1 file2
> </pre>
If the files have identical content, `cmp` exits silently.
If one file is shorter than the other, but the content is otherwise identical,
`cmp` will report that it has reached `EOF` on the shorter file.
Finally, if the files disagree at some point, `cmp` will report the
offset of the first byte at which the files disagree.
We can take this a step further and run an entire test without using any files:
> <pre>
> $ cmp <(echo "j 0x1000") <(echo "j 0x1000" | bin/hw1 -a | bin/hw1 -d)
> $ echo $?
> 0
> </pre>
> :nerd: `<(...)` is known as process substitution. It is allows the output of the
> program(s) inside the parentheses to appear as a file for the outer program.
Because both strings are identical, `cmp` outputs nothing.
Finally, we can test the program on entire files with a similar command:
> <pre>
> $ cmp <(cat rsrc/bcond.asm) <(cat rsrc/bcond.asm | bin/hw1 -a -b 1000 | bin/hw1 -d -b 1000)
> $ echo $?
> 0
> </pre>
> :nerd: `cat` is a command that outputs a file to `stdout`.
## Unit Testing
Unit testing is a part of the development process in which small testable
sections of a program (units) are tested individually to ensure that they are
all functioning properly. This is a very common practice in industry and is
often a requested skill by companies hiring graduates.
> :nerd: Some developers consider testing to be so important that they use a
> work flow called test driven development. In TDD, requirements are turned into
> failing unit tests. The goal is then to write code to make these tests pass.
This semester, we will be using a C unit testing framework called
[Criterion](https://github.com/Snaipe/Criterion), which will give you some
exposure to unit testing. We have provided a basic set of test cases for this
assignment.
The provided tests are in the `tests/hw1_tests.c` file. These tests do the
following:
- `validargs_help_test` ensures that `validargs` sets the help bit
correctly when the `-h` flag is passed in.
- `validargs_disassem_test` ensures that `validargs` sets the Disassembly bit
correctly when the `-d` flag is passed in.
- `help_system_test` uses the `system` syscall to execute your program through
Bash and checks to see that your program returns with `EXIT_SUCCESS`.
### Compiling and Running Tests
When you compile your program with `make`, a `hw1_tests` executable will be
created in your `bin` directory alongside the `hw1` executable. Running this
executable from the `hw1` directory with the command `bin/hw1_tests` will run
the unit tests described above and print the test outputs to `stdout`. To obtain
more information about each test run, you can use the verbose print option:
`bin/hw1_tests --verbose=0`.
The tests we have provided are very minimal and are meant as a starting point
for you to learn about Criterion, not to fully test your homework. You may write
your own additional tests in `tests/hw1_tests.c`. However, this is not required
for this assignment. Criterion documentation for writing your own tests can be
found [here](http://criterion.readthedocs.io/en/master/).
# Hand-in instructions
**TEST YOUR PROGRAM VIGOROUSLY!**
Make sure your directory tree looks like this and that your homework compiles:
<pre>
hw1
├── include
│ ├── const.h
│ ├── debug.h
│ ├── hw1.h
│ ├── instruction.h
│ └── ... Any additional .h files you defined
├── Makefile
├── rsrc
│ ├── bcond.asm
│ ├── bcond.bin
│ ├── examples.asm
│ ├── examples.bin
│ ├── jump.asm
│ ├── jump.bin
│ ├── matmult.asm
│ ├── matmult.bin
│ ├── typei.asm
│ ├── typei.bin
│ ├── typer.asm
│ ├── typer.bin
│ └── ... Any sample text files given or created (will not be used in graded)
├── src
│ ├── hw1.c
│ ├── instr_table.c
│ └── main.c
└── tests
├─── hw1_tests.c
└── ... Any additional criterion test files you may have written
</pre>
This homework's tag is: `hw1`
`$ git submit hw1`
> :nerd: When writing your program try to comment as much as possible. Try to
> stay consistent with your formatting. It is much easier for your TA and the
> professor to help you if we can figure out what your code does quickly!