Assignment 4 (PostScript Interpreter

Starting from:

~~$120~~

$114

Weight: The entire interpreter project (Part 1 and Part 2 together) will count for 12% of your course grade. Part 2 is worth 9%.

This assignment is to be your own work. Refer to the course academic integrity statement in the syllabus.

Download Starter Files

All the starter files provided for this assignment are attached to the HW4 Assignment 4 – Part2 Canvas dropbox as a zip file. The zip file includes several files, but all of your changes will be made to only four files: psParser.py , psItems.py, psOperators.py, and tests_part2.py.

Here are all the files included in the archive:

repl.py: implements the REPL for the interpreter

load.py: implements the loader for the interpreter. Loads and evaluates a collection of PostScript code examples.
psParser.py: implements the lexer and parser for PostScript input.

psItems.py: defines the classes which represent expressions (`Expr` and its subclasses) and evaluated values (`Value` and its subclasses).

psOperators.py: (not included) You will copy your part1 implementation of psOperators.py to the HW4_part2 folder. You need to add the cleanTop() method to this file (in the Operators class suite) . The definition of this method is given on page 3.
buffer.py: implements the Buffer class, used in psParser.py

utils.py: utility functions used in this assignment
tests_part2.py: Python unit tests for testing the interpreter output.
Colors.py : Color values you can use when you print in REPL environment.

Turning in your assignment

Copy your psOperators.py file from part-1 to the HW4_part2 folder and continue developing your code. I strongly encourage you to save a copy periodically so you can go back in time if you really mess something up. To submit your assignment, zip all the files as HW4_part2.zip and turn in your zip file by uploading on the dropbox on Canvas. Please don’t zip the HW4_part2 directory; your zip archive should only include the above 8 files. Also, exclude any directory that includes binary python bytecode (for example __pycache__).

The file that you upload must be named HW4_part2.py . At the top of the psOperators.py file in a comment, please include your name and the names of the students with whom you discussed any of the problems in this homework. This is an individual assignment and the final writing in the submitted file should be *solely yours*. You may NOT copy another student’s code or work together on writing code. You may not copy code from the web, or anything else that lets you avoid solving the problems for yourself.

You may turn in your assignment up to 3 times. Only the last one submitted will be graded.

Implement your code for Python 3. The TA will run all assignments using Python3 interpreter. You will lose points if your code is incompatible with Python 3.

The Problem

In this assignment you will write an interpreter in Python for a simplified PostScript-like language, concentrating on key computational features of the abstract machine, omitting all PS features related to graphics, and using a somewhat-simplified syntax. The simplified language, SPS, has the following features of PostScript language:

Constants:

integer constants, e.g. 1, 2, 3, -4, -5. We will represent integer constants as Python

“int” values in opstack and dictstack.

boolean constants, e.g. , true , false. We will represent boolean constants as Python

“bool” values in opstack and dictstack.

▪ array constants, e.g. [1 2 3 4], [-1 2 3 -4], [1 x 3 4 add 2 sub],

[1 [3 5 5] dup length], [1 2 x 4] where x is a variable. We will represent array constants as ArrayValue objects (see psItems.py file.) Note that arrays can also have subarrays.

name constants, e.g. /fact, /x. A name constant starts with a ‘/’ and letter followed by an arbitrary sequence of letters and numbers. We will represent name constants as Python “str” values in opstack and dictstack.

names to be looked up in the dictionary stack, e.g., fact, x; as for name constants, without the '/'

code constants (i.e., code-arrays); code between matched curly braces { ... }

Operators:

▪ built-in operators on numbers: add, sub, mul, mod, eq, lt, gt

built-in operators on array values: length, getinterval, putinterval, aload, astore. These operators should support ArrayValue arguments.

built-in conditional operators: if, ifelse (you will implement if/ifelse operators in Part2)

built-in loop operator: repeat (you will implement repeat operator in Part 2).
▪ stack operators: dup, copy, count, pop, clear, exch, roll, stack

dictionary creation operator: dict; takes one operand from the operand stack, ignores it, and creates a new, empty dictionary on the operand stack (we will call this psDict)

dictionary stack manipulation operators: begin, end.

begin requires one dictionary operand on the operand stack; end has no operands.

name definition operator: def. We will call this psDef.

stack printing operator: stack. Prints contents of opstack without changing it.
Part 2 - Requirements

The starter code provided to you partially implements a REPL (read-evaluate-print-loop) tool for interpreting PostScript code. In Part 2 you will complete the implementation of the following pieces in the starter code:

The Reader: You need to complete the reader in psParser.py. You will convert the PostScript tokens to their corresponding expression objects.
The Evaluator: You need to implement the `evaluate` methods of expression objects.

Executing (applying) code-arrays: You need to implement the apply method of FunctionValue. You will also implement if, ifelse operators (write the Python methods psIf and psIfelse), repeat loop operator, and forall operator. You will write these functions in the psOperators.py file from Part 1.

Running the Interpreter

Read-Eval-Print. You can run the interpreter in the REPL mode and interpret the PostScript code on the command prompt. The RELP tool is implemented in reply.py file. The interpreter reads PostScript expressions, evaluates them, and displays the results. You can run the REPL tool by executing the reply.py file, i.e.,

python repl.py (or python3 repl.py)

In repl.py, we use the following method of psstacks object to remove the `None` value from the top of the opstack. Make sure to add the following method to Operators class in psOperators.py.

def cleanTop(self):

if len(self.opstack)>1:

if self.opstack[-1] is None:

self.opstack.pop()

Load. Alternatively, you can run the interpreter on the “Loader” mode and run the loaded PostScript programs all at once. The “Loader” loads PostScript code (given as strings), evaluates them, and displays the results. You can run the “Loader” tool by running the following command on the command line:

python repl.py (or python3 repl.py)

Implementation Overview

Here is a brief overview of each of the Read-Eval-Print Loop components in our interpreter. Refer to this section as you work through the project as a reminder of how all the small pieces fit together!

Read: This step parses user input (a string of PostScript code) into our interpreter's internal Python representation of PostScript expressions.

Lexical analysis has already been implemented in the tokenize function in psParser.py. This function returns a list of tokens. The `read` function turns this list into a `Buffer` (defined in buffer.py) which allows to get the tokens one at a time.

Syntactic analysis happens in `read` and `read_expr` functions. Together, these mutually recursive functions parse tokens and converts them into our interpreter's internal Python representation of PostScript expressions.

In our interpreter, there are four types of expressions. psItems.py defines the classes that will represent each of these expressions and they are all subclasses of the `Expr` class:

Literal: represents primitive constants : integers or booleans . The `value` attribute contains the constant value the `Literal` refers to.

Name: represents names of variables, functions, or operators . The `var_name` attribute contains the name of the variable as a Python string, e.g., '/sq','sq','add','def'. If the `var_name` starts with a `/` character, Name represents a name constant, otherwise it represents a variable reference , function call, or a built-in operator call.

Array: represents arrays. The `value` attribute is a Python list that includes the elements of the PostScript array it represents, e.g., Array([Name('x'), Literal(1), Name('x'), Name('sub'), Literal(1)])

Block: represents body of a function or blocks of if, ifelse, repeat, or forall operators. The `value` attribute is a Python list that includes the tokens of the PostScript code-array (block) it represents , e.g., [Literal(10), Literal(5),Name(mul)]

The `read` function calls the `read_expr` function for each token in the buffer and returns a list of expression objects. You will complete `read_expr` function in psParser.py. `read_expr` creates and returns the expression object (i.e., one of the above) for the given token.

In repl.py and load.py , the interpreter calls `read` function to parse the tokens in PostScript code and turns them into a list of expressions.

Eval: This step evaluates PostScript expressions (represented in Python) to obtain values.

Each of the expression classes (i.e., `Literal`, `Name`, `Array`, and `Block`) implement their own `evaluate` methods returning the value that the expression evaluates to. You will complete the implementations of the `evaluate` methods for each of these classes in psItems.py. See “II. The Evaluator” section below for more details.

In repl.py and load.py, the interpreter calls the `evaluate` method of each expression to evaluate the expressions in the expression list.

Print: This step prints the __str__ representation of the obtained value.

Loop: The logic for the loop is handled by the while loop in repl.py. You don’t need to make any changes in repl.py.

The UML diagram on the next page illustrates the class level design for the interpreter.

I. The Reader

The first part of this assignment deals with reading and parsing SPS code. The reader is implemented in psParser.py. To parse the SPS programs,

We first pass the SPS code to the lexer which convert the continuous input text to a list of tokens. `tokenize` function in psParser.py implements the tokenizer. You do not need to make any changes in this part.

Example: Given the PostScript code ,

"""

/x 4 def

/square {dup mul} def

[x 1 x sub 1] /arr exch def

arr 1 1 getinterval aload pop dup 0 gt

{2 mul} {square} ifelse

"""

`tokenize` will return :

['/x', 4, 'def', '/square', '{', 'dup', 'mul', '}', 'def', '[', 'x', 1, 'x', 'sub', 1, ']', '/arr', 'exch', 'def', 'arr', 1, 1, 'getinterval', 'aload', 'pop', 'dup', 0, 'gt', '{', 2, 'mul', '}', '{', 'square', '}', 'ifelse']

Next, we parse the tokens, i.e., convert the tokens into our interpreter's internal Python representation of SPS expressions. The `read` and `read_expr` functions in psParser.py handle parsing.

The `read` function turns the token list (the output of `tokenize`) into a `Buffer` object (defined in buffer.py) which allows to get the tokens one at a time.

"""Parse an expression from a string. If the string does not contain an expression, None is returned. If the string cannot be parsed, a SyntaxError is raised.

"""

def read(s):

#reading one token at a time

src = Buffer(tokenize(s))

out = []

while src.current() is not None:

out.append(read_expr(src))

return out

`read` calls the `read_expr` function to convert each expression to its corresponding expression object. You need to complete the `read_expr` function and create the corresponding expression object for each token.

o As explained in section “Implementation Overview”, in our interpreter there are four types of expressions: Literal, Name, Array, and Block which are subclasses of the `Expr` object defined in psItems.py.

o In the `read_expr` function, you can use the helper functions provided in psParser.py to check the type of each token and to retrieve the tokens that will be included in the array or code-array values. Note that the tokens between `[` and `]` belong to an array and those between `{` and `} belong to a code array.

""" Converts the next token in the given Buffer to an expression. """

def read_expr(src):

token = src.pop_first()

if token is None:

raise SyntaxError('Incomplete expression')

TO-DO - complete the following; include each condition as an `elif` case.

if the token is a literal return a `Literal` object having `value` token.

if the token is a name, create a Name object having `var_name` token.

if the token is an array delimiter (i.e., '['), get all tokens until the

matching ']' delimiter and combine them as a Python list;

create an Array object having this list value.

if the token is a codearray delimiter (i.e., '{'), get all tokens until

the matching '}' delimiter and combine them as a Python list;

create a Block object having this list value.

else:

raise SyntaxError("'{}' is not the start of an expression".format(token))

Given the tokens,

['/x', 4, 'def', '/square', '{', 'dup', 'mul', '}', 'def', '[', 'x', 1, 'x', 'sub', 1, ']', '/arr', 'exch', 'def', 'arr', 1, 1, 'getinterval', 'aload', 'pop', 'dup', 0, 'gt', '{', 2, 'mul', '}', '{', 'square', '}', 'ifelse']

`read` should return:

[Name('/x'),Literal(4),Name('def'),Name('/square'),Block([Name('dup'),Name('mul')]),

Name('def'), Array([Name('x'), Literal(1), Name('x'), Name('sub'),Literal(1)]),

Name('/arr'), Name('exch'), Name('def'), Name('arr'), Literal(1), Literal(1),

Name('getinterval'), Name('aload'), Name('pop'), Name('dup'), Literal(0), Name('gt'),

Block([Literal(2), Name('mul')]), Block([Name('square')]), Name('ifelse')],

The above is the print of the __repr__ representation of the expression list `read` returns.

The Evaluator

Evaluator evaluates the PostScript expressions (represented as Python objects). In repl.py and load.py, the interpreter calls the `evaluate` method of each expression to evaluate the expressions in the expression list.

Each of the expression classes (i.e., Literal, Name, Array, and Block) implement their own `evaluate` method returning the value that the expression evaluates to. For example:

1. `Literal` object is evaluated to the integer or boolean value it represents (e.g. 15 or `True` or `False`) and pushed onto the opstack.

`Name` object is evaluated according to the following:

If the `Name` represents a name constant (i.e., its `var_name` attribute starts with a `/`), it will be evaluated to a Python string having value `var_name` . The evaluated value will be pushed onto the opstack.

If the `Name` represents a built-in operator (i.e., its `var_name` attribute is one of the built-in operator names), then we will evaluate it by executing the operator function defined in psOperators.py in the current environment (opstack).

If the `Name` represents a variable or function, interpreter looks up the value of the variable in the current environment (dictstack).

If the variable value is a function (`FunctionValue`), it should be applied (i.e., executed) by calling its `apply` method.

Otherwise, the variable value is a constant and it should be pushed onto the opstack.

`Array` object is evaluated to an `ArrayValue` value. The `value` of the `Array` needs to be evaluated on the opstack and the ArrayValue `value` attribute will be initialized to the evaluated list of array elements. For example: an `Array` with `value`

[Name('x'), Literal(1), Name('x'), Name('add'), Literal(True)] will be evaluated to `ArrayValue` with `value`

[10,11,True] assuming x’s value is 10.

The evaluated `ArrayValue` is pushed onto the opstack.

`Block` object is evaluated to `FunctionValue` value. For example: a `Block` with `value` attribute [Name('dup'), Name('mul')] will be evaluated to `FunctionValue` with the same `value` (i.e., [Name('dup'), Name('mul')]. The evaluated `FunctionValue` is pushed onto the stack.
Handling of code-arrays ; if/ifelse, repeat, forall operators

Recall that a `Block` represents the blocks of if, ifelse, repeat, and forall operators and the function bodies. As explained above, when a `Block` is evaluated, a `FunctionValue` object having the same list of tokens is pushed to the opstack. Once a `FunctionValue` is on the stack several things can happen:

if it is the top item on the stack when a `def` is executed (i.e. the `FunctionValue` is the body of a function), it is stored as the value of the name defined by the def.

if it is the body part of an if/ifelse operator, the `psIf` (or `psIfElse`) function will execute (apply) the `FunctionValue`. For the if operator, the FunctionValue is executed only if the

“condition” argument for if operator is true. For the ifelse operator, if the “condition” argument is true, first `FunctionValue` is executed, otherwise the second `FunctionValue` is executed .

if it is the body part of a repeat operator, repeat function will execute (apply) the `FunctionValue` as part of the evaluation of the repeat loop.

if it is the body part of a forall operator, forall function will execute (apply) the `FunctionValue` as part of the evaluation of the forall.

finally, when a `Name` is looked up and if its value is a `FunctionValue`, then the expression represents a function call and the FunctionValue value will be executed (applied).

A `FunctionValue` can be executed by calling its `apply` method. `apply` should evaluate all tokens in the function body.

Testing

1. Testing your interpreter manually:

You can do manual tests by using the RELP tool, i.e., by typing PostScript code on the REPL command line and checking the contents of the opstack and dictstack after the PostScript code is interpreted. I suggest you print both the opstack and the dictstack when the “stack” operation called to help with the debugging. You can print the opstack and dictstack in different colors to improve the readability. `colors.py` file includes some predefined colors. To print a string `s` in green and then reset the color back to default, you can use the print statement : print(OKGREEN+s+CEND)

2. Testing your interpreter using the loader:

You can load the PostScript code in the loader and interpret it. The given test inputs are already included in `load.py`. The `for` loop in this file iterates over each test input included in `tests` list, interprets them, and print the content of the `opstack` . When you add a new PostScript test input to this file, make sure to add it to the `tests` list. The expected outputs for the given 31 tests are included in the “expected_output” dictionary in load.py.

3. Automated tests:

Sample Python unittests testing for the same inputs are included in `tests_part2.py` file. You should provide 5 additional Python unittests in addition to the provided tests. Make sure that the PostScript code in your tests are different than the provide examples and your tests include several operators. You will loose points if you fail to provide tests or if your tests are too simple.

Testing the parsing:

Before even attempting to run your full interpreter, make sure that your parsing is working correctly. Make sure you get the correct parsed output for the testcases. The parsed tokens for the 31 tests are provided in load.py – see the “parse_output” dictionary in load.py.