Starting from:
$35

$29

Project 1: A Compiler for the TinyL Language Solution

THIS IS NOT A GROUP PROJECT! You may talk about the project and possible solutions in general terms, but must not share code. In this project, you will be asked to write a recursive descent LL(1) parser and code generator for the tinyL language. Your compiler will generate RISC machine instructions. You will also write a code optimizer that takes RISC machine instructions as input and removes redundant code. The output of the optimizer is a sequence of RISC machine instructions which produces the same results as the original input sequence but is more e cient. To test your generated programs, you will use a provided virtual machine that can \run" your RISC code. The project will require you to manipulate doubly-linked lists of instructions. In order to avoid memory leaks, explicit deallocation of unused memory space is necessary.


    • Background

1.1    The tinyL language

tinyL is a simple expression language that allows assignments and basic I/O operations.

<program>
::=
<stmt


list> !
















<stmt


list>
::=
<stmt> <morestmts>

<morestmts>
::=
; <stmt


list> j




<stmt>
::=
<assign> j <read> j <print>
<assign>
::=
<var> = <expr>



<read>
::=  % <var>







<print>
::=
$ <var>







<expr>
::=
<arith

expr> j












<logical

expr> j











<var> j















<digit>







<arith

expr>
::=
+ <expr> <expr> j


















<expr> <expr> j










<expr> <expr>



<logical

expr>
::=
& <expr> <expr> j










j <expr> <expr>

j

<var>
::=
a
j
b



j
c
j
d
j
e

f
<digit>
::=
0
j
1



j
2
j
3
j
4
j
5 j 6 j 7 j 8 j 9

Two examples of valid tinyL programs:

    1. %a;%b;c=&3*ab;d=+c1;$d!


1
    2. %a;b=-*+1+2a58;$b!

1.2    Target Architecture

The target architecture is a simple RISC machine with virtual registers, i.e., with an un-bounded number of registers. All registers can only store integer values. A RISC architec-ture is a load/store architecture where arithmetic instructions operate on registers rather than memory operands (memory addresses). This means that for each access to a memory location, a load or store instruction has to be generated. Here is the machine instruction set of our RISC target architecture. Rx , Ry , and Rz represent three arbitrary, but distinct registers.


instr. format
description
semantics















memory instructions






LOADI Rx  #<const>
load constant value #<const> into register Rx
Rx
<const>
LOAD Rx  <id>
load value of variable <id> into register Rx
Rx
<id>
STORE <id> Rx
store value of register Rx into variable <id>
<id>
Rx









arithmetic instructions








ADD Rx
Ry
Rz
add contents of registers Ry and Rz , and
Rx
Ry + Rz



store result into register Rx


SUB Rx
Ry
Rz
subtract contents of register Rz from register
Rx
Ry    Rz



Ry , and store result into register Rx

Ry   Rz
MUL Rx
Ry
Rz
multiply contents of registers Ry and Rz , and
Rx




store result into register Rx











logical instructions








AND Rx
Ry
Rz
apply bit wise AND operation to
Rx
Ry & Rz



contents of registers Ry and Rz , and store





result into register Rx

Ry j Rz
OR Rx
Ry
Rz
apply bit wise OR operation to contents
Rx




of registers Ry and Rz , and store





result into register Rx











I/O instructions





READ <id>
read value of variable <id> from standard input
read( <id> )
WRITE <id>
write value of variable <id> to standard output
print( <id> )







1.3    Dead Code Elimination

Our tinyL language does not contain any control ow constructs (e.g.: jumps, if-then-else, while). This means that every generated instruction will be executed. However, if the execution of an operation or instruction does not contribute to the input/output behavior of the program, the instruction is considered \dead code" and therefore can be eliminated without changing the semantics of the program. Please note that READ instructions may not be eliminated although the read value may have no impact on the outcome of the program.

2

All READ instructions are kept since the overall input/output program behavior needs to be preserved.

Your dead-code eliminator will take a list of RISC instructions as input. For example, in the following code


LOADI Rx  #c1
LOADI Ry  #c2
LOADI Rz  #c3
ADD R1  Rx  Ry
MUL R2  Rx  Ry
STORE a R1
WRITE a

the MUL instruction and the third LOADI instruction can be deleted without changing the semantics of the program. Therefore, your dead-eliminator should produce the code:


LOADI Rx  #c1
LOADI Ry  #c2
ADD R1  Rx  Ry
STORE a R1
WRITE a


    • Project Description

The project consists of two components:

    1. Complete the partially implemented recursive descent LL(1) parser that generates RISC machine instructions.

    2. Write a dead-code eliminator that recognizes and deletes redundant, i.e., dead RISC machine instructions.

The project represents an entire programming environment consisting of a compiler, an optimizer, and a virtual machine (RISC machine interpreter). You are required to implement the compiler and the optimizer (described in Section 2.1 and 2.2). The virtual machine is provided to you (described in Section 2.3).

2.1    Compiler

The recursive descent LL(1) parser implements a simple code generator. You should follow the main structure of the code as given to you in le Compiler.c. As given to you, the le


3

contains code for function digit, assign, and print, as well as partial code for function expr. As is, the compiler is able to generate code only for expressions that contain \+" operations and constants. You will need to add code in the provided stubs to generate correct RISC machine code for the entire program. Do not change the signatures of the recursive functions. Note: The left-hand and right-hand occurrences of variables are treated di erently.

2.2    Dead Code Elimination Optimization

The dead code elimination optimizer expect the input le to be provided at the standard input (stdin), and will write the generated code back to standard output (stdout).

The basic algorithm identi es \crucial" instructions. The initial crucial instructions are all READ and WRITE instructions. For all WRITE instruction, the algorithm has to detect all instructions that contribute to the value of the variable that is written out. First, you will need to nd the store instruction that stores a value into the variable that is written out. This STORE instruction is marked as critical and will reference a register, let’s say Rm. Then you will nd the instructions on which the computation of the register Rm depends on and mark them as critical. This marking process terminates when no more instructions can be marked as critical. If this basic algorithm is performed for all WRITE instructions, the instructions that were not marked critical can be deleted.

Instructions that are deleted as part of the optimization process have to be explicitly deallocated using the C free command in order to avoid memory leaks. You will implement your dead code elimination pass in the le Optimizer.c. All of your \helper" functions should be implemented in this le.

2.3    Virtual Machine

The virtual machine executes a RISC machine program. If a READ <id> instruction is executed, the user is asked for the value of <id> from standard input (stdin). If a WRITE <id> instruction is executed, the current value of <id> is written to standard output (stdout). The virtual machine is implemented in le Interpreter.c. DO NOT MODIFY this le. It is there only for your convenience so that you may be able to copy the source code of the virtual machine, for instance, to your laptop and compile it there. Be aware that you are welcome to work on your project using your own computer, however, you need to make sure your code will eventually compile and run on the ilab cluster. All grading will be done on ilab.

The virtual machine assumes that an arbitrary number of registers are available (called virtual registers), and that there are only six memory locations that can be accessed using variable names (‘a’ ... ‘e’ ‘f’). In a real compiler, an additional optimization pass maps virtual registers to the limited number of physical registers of a machine. This step is typically called register allocation. You do not have to implement register allocation in this project. The virtual machine (RISC machine language interpreter) will report the overall number of



4

executed instructions for a given input program. This allows you to assess the e ectiveness of your optimization component.


    • Grading

You will submit two les Optimizer.c and Compiler.c. No other le should be modi ed, and no additional le(s) may be used. Please make a tarball of these two les: \tar -cvf proj1 submission.tar Optimizer.c Compiler.c" and submit the proj1 submission.tar le. Do not submit any executables or any of your test cases.


Your programs will be graded based mainly on functionality. Functionality will be veri ed through automatic testing on a set of syntactically correct test cases. No error handing is required. The project code package contains 10 test cases. Note that during grading we will use another 10 hidden test cases. Your grade will be based on these 20 test cases.

The project code package also contains executables of reference solutions for the compiler (compile.sol) and optimizer (optimize.sol).

A simple Make le is provided in the package for your convenience. In order to create the compiler, type make compile at the Linux prompt, which will generate the executable compile. The Make le contains rules to create executables of your optimizer (make optimize) and virtual machine (make run). The Make le also contains a rule that cleans all the object les before you recompile your code.

The provided, initial compiler contains some code for parsing a single assignment state-ment with right-hand side expressions of only additions of numbers, followed by a single print statement. The provided code is by no means complete (it is only meant to show you the code framework). You will need to extend it and support the complete TinyL language.


    • How To Get Started

Create your own directory on the ilab cluster, and copy the provided project code package to your directory. Make sure that the read, write, and execute permissions for groups and others are disabled (chmod go-rwx <directory name>).


Type make compile to generate the compiler. To run the compiler on a test case \sam-ple2.tinyL" in the folder tests/, type ./compile tests/sample2.tinyL. This will generate a RISC machine program in the le tinyL.out. To create your optimizer, type make optimize. The initial (provided) version of the optimizer does not work at all.

To call your optimizer on a le that contains RISC machine code, for instance le tinyL.out, type ./optimize < tinyL.out > optimized.out. This will generate a new le optimized.out containing the output of your optimizer. The operators \<" and \>" are Linux redirection operators for standard input (stdin) and standard output (stdout), re-spectively. Without those, the optimizer will expect instructions to be entered on the Linux command line, and will write the output to your screen.

You may want to use valgrind for memory leak detection. We recommend to use the following ags, in this case to test the optimizer for memory leaks:

5

valgrind --leak-check=full --show-reachable=yes --track-origins=yes ./optimize < tinyL.out


The RISC virtual machine (RISC machine program interpreter) can be generated by typing make run. The distributed version of the VM in Interpreter.c is complete and should not be changed. To run a program on the virtual machine, for instance tinyL.out, type ./run tinyL.out. If the program contains READ instructions, you will be prompted at the Linux command line to enter a value. Finally, you can de ne a tinyL language interpreter on a single Linux command line as follows:

./compile test1; ./optimize < tinyL.out > opt.out; ./run opt.out.

The \;" operator allows you to specify a sequence of Linux commands on a single command line.


    • Questions

Please post all questions regarding the project on Sakai forum.







































6

More products