$24
1 Top-down parsing
1.1 LL(1) form
Rewrite the following grammar into LL(1) form, by left factoring it and elimi- nating left recursion.
S → sC T |sC T wB C → c
T → t|
B → Ba|a
1.2 Parsing table
Tabulate the FIRST and FOLLOW sets of the nonterminals in the resulting grammar, and construct the predictive parsing table.
2 VSL specification
The directory in the code archive ps2 skeleton.tgz begins a compiler for a slightly modified 64-bit version of VSL (“Very Simple Language”), defined by Bennett (Introduction to Compiling Techniques, McGraw-Hill, 1990).
Its lexical structure is defined as follows:
• Whitespace consists of the characters ’\t’, ’\n’, ’\r’, ’\v’ and ’ ’. It is ignored after lexical analysis.
• Comments begin with the sequence ’//’, and last until the next ’\n’ char- acter. They are ignored after lexical analysis.
• Reserved words are FUNC, BEGIN, END, RETURN, PRINT, CONTINUE, IF, THEN, ELSE, WHILE, DO, and VAR.
• Operators are assignment (:=), the basic arithmetic operators ’+’, ’-’, ’*’,
’/’, and relational operators ’=’, ’<’, ’’.
• Numbers are sequences of one or more decimal digits (’0’ through ’9’).
• Strings are sequences of arbitrary characters other than ’\n’, enclosed in double quote characters ’”’.
• Identifiers are sequences of at least one letter followed by an arbitrary se- quence of letters and digits. Letters are the upper- and lower-case English alphabet (’A’ through ’Z’ and ’a’ through ’z’), as well as underscore (’ ’). Digits are the decimal digits, as above.
The syntactic structure is given in the context-free grammar on the last page of this document.
Building the program supplied in the archive ps2 skeleton.tgz combines the contents of the src/ subdirectory into a binary src/vslc which reads standard input, and produces a parse tree.
The structure in the vslc directory will be similar throughout subsequent problem sets, as the compiler takes shape. See the slide set from the PS2 recitation for an explanation of its construction, and notes on writing Lex/Yacc specifications.
2.1 Scanner
Complete the Lex scanner specification in src/scanner.l, so that it properly tokenizes VSL programs.
2.2 Tree construction
A node t structure is defined in include/ir.h. Complete the auxiliary functions node init, and node finalize so that they can initialize/free node t-sized memory areas passed to them by their first argument. The function destroy subtree should recursively remove the subtree below a given node, while node finalize should only remove the memory associated with a single node.
2.3 Parser
Complete the Yacc parser specification to include the VSL grammar, with se- mantic actions to construct the program’s parse tree using the functions im- plemented above. The top-level production should assign the root node to the globally accessible node t pointer ’root’ (declared in src/vslc.c).
program → global list
global list → global | global list global global → f unction | declaration
statement list → statement | statement list statement print list → print item | print list 0 ,0 print item
expression list → expression | expression list 0 ,0 expression variable list → identif ier | variable list 0 ,0 identif ier argument list → expression list |
parameter list → variable list |
declaration list → declaration | declaration list declaration f unction → F U N C identif ier 0 (0 parameter list 0 )0 statement statement → assignment statement | return statement
statement
statement
→
→
print statement | if statement
while statement | null statement | block
block
→
BEGI N
declaration list statement list EN D
block
→
BEGI N
statement list EN D
assignment statement → identif ier 0 :0 0 =0 expression
return statement → RET U RN expression print statement → P RI N T print list
null statement → C ON T I N U E
if statement → I F relation T H EN statement
if statement → I F relation T H EN statement ELSE statement while statement → W H I LE relation DO statement
relation → expression 0 =0 expression
relation → expression 0 <0 expression relation → expression 0 0 expression expression → expression 0 +0 expression
expression
→
expression
expression
→
expression
expression
→
expression
0 −0 expression
0 ∗0 expression
0 /0 expression expression → 0 −0 expression
expression → 0 (0 expression 0 )0
0 (0 argument list 0 )0
expression
→
number | identif ier | identif ier
declaration
→
V AR variable list
print item
→
expression | string
identif ier → I DEN T I F I ER
number → N U M BER
string → ST RI N G