$24
An Interpreter for a Simple Postscriptlike Language
Weight: The entire interpreter project (Part A and Part B together) will count for 10% of your course grade. Part B is worth 8%.
This assignment is to be your own work. Refer to the course academic integrity statement in the syllabus.
Turning in your assignment
Rename your Part A submission file as HW2_partB.py and continue developing your code in the HW2_partB.py file. I strongly encourage you to save a copy of perio you can go back in time if you really mess something up. Using a version control system like GitHub would be a good idea! (Please do not make your GitHub repo pub create private repos on the EECS GitHub server at https://github.eecs.wsu.edu/) To submit your assignment, turn in your file by uploading on the Assignment2(Interpreter DROPBOX on Blackboard (under AssignmentSubmisions menu).
The file that you upload must be named HW2_partB.py . Be sure to include your name as a comment at the top of the file. Also in a comment, indicating whether y intended for Unix/Linux or Windows. You may turn in your assignment up to 5 times. Only the last one submitted will be graded. Please let the instructor know if you n resubmit it a 6th time.
Implement your code for Python 3. The TA will run all assignments using Python3 interpreter. You will lose points if your code is incompatible with Python
The work you turn in is to be your own personal work. You may not copy another student's code or work together on writing code. You may not copy code from the anything else that lets you avoid solving the problems for yourself.
Grading
The assignment will be marked for good programming style (appropriate algorithms, good indentation and appropriate comments refer to the Python style guide) as thoroughness of testing and clean and correct execution. You will lose points if you don't (1) provide test functions, (2) explain your code with appropriate comments, a a good programming style.
The Problem
In this assignment you will write an interpreter in Python for a simplified PostScriptlike language, concentrating on key computational features of the abstract machin all PS features related to graphics, and using a somewhatsimplified syntax. The simplified language, SPS, has the following features of PS:
integer constants, e.g. 123: in python3 there is no practical limit on the size of integers
string constants, e.g. (CptS355): string delimited in paranthesis
name constants, e.g. /fact: start with a / and letter followed by an arbitrary sequence of letters and numbers
names to be looked up in the dictionary stack, e.g. fact: as for name constants, without the /
code constants: code between matched curly braces { ... }
builtin operators on numbers: add, sub, mul, div, mod
builtin operators on string values: length, get, put, getinterval. Check Appendix for more information on string functions.
builtin loop operator: for; make sure that you understand the order of the operands on the stack. Play with ghostscript if necessary to help understand what is h
stack operators: dup, exch, pop, roll,copy, clear
dictionary creation operator: dict; takes one operand from the operand stack, ignores it, and creates a new, empty dictionary on the operand stack
dictionary stack manipulation operators: begin, end. begin requires one dictionary operand on the operand stack; end has no operands.
name definition operator: def. This requires two operands, a name and a value
defining (using def) and calling functions
stack printing operator (prints contents of stack without changing it): stack
Part B Requirements (Due Oct 5)
In Part 2 you will continue building the interpreter, making use of everything you built in Part 1. The pieces needed to complete the interpreter are:
Parsing Postscript code
Handling of code arrays
Handling the for loop operator
Function calling
Actually interpreting input strings (code) in the simple Postscript language.
Parsing
Parsing is the process by which a program is converted to a data structure that can be further processed by an interpreter or compiler. Our SPS programs are very sim parse. Essentially all we need to do is convert the continuous input text to a list of tokens and convert each token to our chosen representation for it. One way to think as the minimumsized "interesting" chunks of the text of a program. In SPS the tokens are: multidigit numbers with optional negative sign, strings enclosed in paranthe character names (with and without a preceding /), and the curly brace characters. We've already decided about how some of these will be represented: numbers as P numbers, names as Python strings, strings as Python strings, etc. For code arrays, we will represent things falling between the braces using Python lists.
2., 3., 4. Handling of code arrays, for loop operator, function calling
1/4
12/21/ CptS 355 - Assignment 2 - Part A
Recall that a code array is pushed on the stack as a single unit when it is read from the input. Once a code array is on the stack several things can happen:
if it is the body part of a for loop, it is recursively interpreted as part of the evaluation of the for loop. At each iteration of the for loop the loop index is pushed ont (We will get to interpreting momentarily).
if it is the top item on the stack when a def is executed, it is stored as the value of the name defined by the def. Finally, if when a name is looked up you find that a code array, the code array is recursively interpreted.
5. Interpreter
A key insight is that a complete SPS program is essentially a code array. It doesn't have curly braces around it but it is a chunk of code that needs to be interpreted. T how to proceed: convert the SPS program (a string of text) into an list of tokens and code arrays. Define a Python function interpret that takes one of these lists as inp processes it. Interpret the body of the for loop operator recursively, for each value of the loop index. When a name lookup produce a code array as its result, recursive it, thus implementing Postscript function calls.
Parsing revisited
Parsing converts an SPS program in the form a string to a program in the form of a code array. It will work in two stages: first, convert all the string to a list of tokens. A convert the token list to a code array. The difference between the two will be that in the code array, everything between matching curly braces will be represented as a element (which itself is a code array). In the process of converting from token list to code array the curly braces will disappear, and the string representations of numb booleans will be converted to Python ints and strings.
Tokenizing
# For tokenizing we'll use the re package for Regular Expressions.
import re
def tokenize(s):
return re.findall("/?[azAZ][azAZ09_]*|[(][\w \W]*[)]|[]?[09]+|[}{]+|%.*|[^ \t\n]", s)
See what tokenize does print (tokenize(
"""
/fact { 0 dict begin
/n exch def 1
n 1 1 {mul} for end
}def
(factorial function !) 0 9 getinterval stack
5 fact stack
"""))
['/fact', '{', '0', 'dict', 'begin', '/n', 'exch', 'def', '1', 'n', '1', '1', '{', 'mul', '}', 'for', 'end', '}', 'def', '(factorial function !)', '0', '9', 'getinterval', 'stack', '5', 'fact', 'stack']
Grouping and converting to Python representations for numbers and strings
The output of tokenize isn't fully suitable because things between matching curly braces are not themselves grouped into a code array. We need to convert the output above example to
['/fact', [0, 'dict', 'begin', '/n', 'exch', 'def', 1, 'n', 1, 1, ['mul'], 'for', 'end'], def', '(factorial function !)', 0, 9, 'getinterval', 'stack', 5, 'fact', 'stack']
Notice how in addition to grouping tokens between curly braces into lists, we've also converted the strings that represent numbers to Python numbers. We have not re paranthesis that delimit the postscript string constants (e.g., (factorial function !)). Keeping the paranthesis for strings helps with typechecking for string arguments.
The main issue in how to convert to a code array is how to group things that fall in between matching curly braces. Here is the code for one of those ways using an ite
The it argument is an iterator that returns left or right parenthesis characters.
The sequence of characters returned by the iterator should represent a string of properly nested
parentheses pairs, from which the leading '(' has already been removed. If the
parenteses are not properly nested, returns False.
def groupMatching(it):
res = ['(']
for c in it:
if c==')':
res.append(')')
return res
else:
note how we use a recursive call to group the inner
matching parenthesis string and append it as a whole
to the list we are constructing.
Also note how we've already seen the leading '(' of this
inner group and consumed it from the iterator. res.append(groupMatching(it))
return False
http://www.eecs.wsu.edu/~arslanay/CptS355/assignments/interpreterPartB- .html
2/4
12/21/ CptS 355 - Assignment 2 - Part A
function to turn a string of properly nested parentheses
into a list of properly nested lists.
def group(s):
if s[0]=='(':
return groupMatching(iter(s[1:]))
else: return False # If it starts with ')' it is not properly nested
Here we use an interator constructed from a string, but the iter function will equally well create an iterator from a list. Of course, your code has to deal with curly brace parentheses and it must also deal with strings that contain tokens in addition to the curly braces. And don't forget that strings representing numbers should be convert this stage as well. I urge you to not include the curly brace characters in the resulting code array. The structure of the code array itself is sufficient for what we will do w
To illustrate the above point, consider this modified version of groupMatching and group which doesn't put the paren characters into its result. Just the structure of the sufficient to show the structure of the orginal string.
The it argument is an iterator that returns left or right parenthesis characters.
The sequence of return characters should represent a string of properly nested
parentheses pairs, from which the leading '(' has already been removed. If the
parenteses are not properly nested, returns False.
def groupMatching(it):
res = []
for c in it:
if c==')':
return res
else:
note how we use a recursive call to group the inner
matching parenthesis string and append it as a whole
to the list we are constructing. res.append(groupMatching(it))
return False
function to turn a string of properly nested parentheses
into a list of properly nested lists.
def group(s):
if s[0]=='(':
return groupMatching(iter(s[1:]))
else: return False # If it starts with ')' it is not properly nested
group('(()(()))')
Your parsing implementation
Write your parsing code here; it takes a list of tokens produced by tokenize
and returns a code array; Of course you may create additional functions to help you write parse()
#
def parse(tokens):
pass
interpret
We're now ready to write the interpret function. It takes a code array as argument, and changes the state of the operand and dictionary stacks according to what it find doing any output indicated by the SPS program (using the stack operator) along the way. interpret may be called recursively from the for loop operator, or when a nam up and its value is a code array.
Write the necessary code here; again write
auxiliary functions if you need them. This will probably be the largest
function of the whole project, but it will have a very regular and obvious
structure if you've followed the plan of the assignment.
#
def interpret(code): # code is a code array
pass
interpreter
Finally, we can write the interpreter function that treats a string as an SPS program and interprets it.
Copy this to your HW2_partB.py file def interpreter(s): # s is a string
interpret(parse(tokenize(s)))
A note about the "put" operator:
The put operand gets a string, an index (integer) value, and an ASCII(integer) value (from the stack), replaces the character at position index in the string with the give character. The resulting string, however,is not stored in the operand stack. Hence, for explicit use of put operator, we can define a string variable and apply the put op the value of this variable. For example:
/s1 (CptS355) def
s1 0 99 put
http://www.eecs.wsu.edu/~arslanay/CptS355/assignments/interpreterPartB- .html 3/4
12/21/ CptS 355 - Assignment 2 - Part A
Interpreting the above postscript expression requires redefining the variable /s1. In your interpreter we will simplify this: If put operation is applied on the value of a var don't need to update the variable value in the dictionary stack.
Testing
First test the parsing
Before even attempting to run your full interpreter, make sure that your parsing is working correctly. Do you get what you expect as the result of the following:
parse(tokenize(
'''
1 1 5 {10 mul} for
'''
))
Make sure that the result contains a python integer. How about:
parse(tokenize(
'''
1
n 1 1 {mul} for
'''
))
Make sure that there are two nested code arrays.
You should know what the correct result is for the following more complicated example. Is your code producing the right answer? There's not much point in going on u
parse(tokenize(
"""
/fact {
dict
begin
/n exch def 1
n 1 1 {mul} for end
}def
(factorial function !) 0 9 getinterval stack
5 fact stack
"""))
Finally, test the full interpreter
Copy this to your HW2_partB.py file. Add tests of your own. interpreter(
"""
/fact{ 0 dict
begin
/n exch def
1
n 1 1 {mul} for
end
}def
(factorial function !) 0 9 getinterval
stack
fact
stack
"""
)
http://www.eecs.wsu.edu/~arslanay/CptS355/assignments/interpreterPartB- .html 4/4