Starting from:

$35

Project 3: Parsing and Type Checking Solution

Your goal is to write a predictive parser and write a type checker for a given language. The parser checks the syntax of the input and the type checker enforces the semantic rules of the language. The semantic rules that we are interested in specify which assignments are valid and which expressions are valid.

The input to your code will be a program and the output will be:




an error message if there is a type mismatch or syntax error or



lists of symbols of different types if there is no error.



Grammar Description









program




scope




scope list




scope list




scope list




scope list




declaration




type decl




type name




var decl




id list




id list




stmt list




stmt list




stmt




stmt




assign stmt




while stmt




expr




expr




term




term




term




factor




factor




factor




factor




condition




condition




primary




primary




primary




relop




relop




relop




relop




relop










! scope
To be added


! LBRACE scope list RBRACE












! stmt
scope_list → scope
! declaration
scope_list → scope scope_list
! stmt scope list


! declaration scope list




! type decl | var decl




! TYPE id list COLON type name SEMICOLON




! REAL | INT | BOOLEAN | STRING | LONG | ID




! VAR id list COLON type name SEMICOLON


! ID


! ID COMMA id list


! stmt


! stmt stmt list


! assign stmt


! while stmt


! ID EQUAL expr SEMICOLON


! WHILE condition LBRACE stmt list RBRACE


! term


! term PLUS expr


! factor


! factor MULT term
To be deleted
!
factor DIV term




LPAREN expr RPAREN



NUM



REALNUM



ID



ID



primary relop primary



ID



NUM



REALNUM



GREATER



GTEQ



LESS



NOTEQUAL



LTEQ














The tokens used in the grammar description are:




TYPE
= TYPE
VAR
= VAR
REAL
= REAL
INT
= INT
BOOLEAN
= BOOLEAN
STRING
= STRING
LONG
= LONG
WHILE
= WHILE
COMMA
= ,
COLON
= :
SEMICOLON
= ;
LBRACE
= {
RBRACE
= }
LPAREN
= (
RPAREN
= )
EQUAL
= =
PLUS
= +
MULT
= *
DIV
= /
GREATER
=
GTEQ
= =
LESS
= <
LTEQ
= <=
NOTEQUAL
= <
ID
= letter (letter + digit)*
NUM
= 0 + (pdigit digit*)
REALNUM
= NUM \. digit digit*







Language Semantics









2.1. Scoping Rules




Lexical scoping is used. Every scope defines a scope.




2.2. Types




The language has five built-in types: INT , REAL , BOOLEAN , STRING , and LONG . Programmers can also declare new types with a type decl which can appear anywhere in the program, except in







the statement list of a while stmt .







2.3. Variables




Programmers declare variables explicitly in var decl . The names of the declared variables appear in the id list and their type is the type name .




Example




Consider the following program written in our language:










{




TYPE a : INT;




TYPE b : a;










2



VAR x : b;




VAR y : c;




y = x;




}







This program has two declared variables: x and y .







2.4. Declaration vs. Use




Any appearance of a name (type or variable) in the program is either a declaration or a use. The following lists all possible declarations of a name:




Any appearance of a name in the id list part of a type decl



Any appearance of a name in the id list part of a var decl






Any other appearance of a name is considered a use of that name. Note that the above definitions exclude the built-in type names.




Given the following example (the line numbers are not part of the input):










01 {




TYPE a : INT;



TYPE b : a;



VAR x : b;



VAR y : c;



y = x;



}






We can categorize all appearances of names as declaration or use:




Line 2, the appearance of name a is a declaration



Line 3, the appearance of name b is a declaration



Line 3, the appearance of name a is a use



Line 4, the appearance of name x is a declaration



Line 4, the appearance of name b is a use



Line 5, the appearance of name y is a declaration



Line 5, the appearance of name c is a declaration



Line 6, the appearance of name y is a use



Line 6, the appearance of name x is a use






2.5. Type System




Our language uses structural equivalence for checking type equivalence.




Here are all the type rules/constraints that your type checker will enforce (constraints are labeled from




C1 to C4 for reference):




C1: The left hand side of an assignment should have the same type as the right hand side of that assignment



3



C2: The operands of an operation ( PLUS , MINUS , MULT , and DIV ) should have the same type (it can be any type, including STRING and BOOLEAN )



C3: The operands of a relational operator (see relop in grammar) should have the same type (it can be any type, including STRING and BOOLEAN )



C4: condition should be of type BOOLEAN



The type of an expr is the same as the type of its operands



The result of p1 relop p2 is of type BOOLEAN (assuming that p1 and p2 have the same type)






NUM constants are of type INT



REALNUM constants are of type REAL



If two types cannot be determined to be the same according to the above rules, the two types are different



Parser






You must write a parser for the grammar, If your parser detects a syntax error in the input, it should output the following message and exit:










Syntax Error







You can start coding by writing the parser first and then move on to implementing the type checking part. You should make sure that your parser generates a syntax error message if the input program does not follow the proper syntax.




We recommend that you check your code on the submission website to make sure it passes all the test cases in the parsing category before moving on to implementing the type checking part.




Our grammar is not LL(1) i.e. it does not satisfy the conditions for predictive parser with one token lookahead, however, it is still possible to write a recursive descent parser with no backtracking. We can do this by looking ahead at more than one token in some cases and left-factoring some rules. In particular, parsing condition requires more than one token lookahead. Also, two rules like







stmt list ! stmt




stmt list ! stmt stmt list







can be parsed by first parsing stmt and then checking if the next token is the beginning of stmt list or in the FOLLOW of stmt list . In fact the two rules are equivalent to







stmt list ! stmt stmt list1




stmt list1 ! stmt list | ✏




but you do not need to explicitly left-factor the rules by introducing stmt list1 to parse stmt list .




























4
Output






Your program will check for the following semantic errors and output the correct message when it en-counters that error. Note that there will only be at most one error per test case.




4.1. Redeclaration Errors




Errors involving programmer-defined types:



Programmer-defined type declared more than once (error code 1.1)



A type is declared more than once if it appears in more than one id list of type decl in the same scope. Declaring the same type name is allowed in non-overlapping scopes and in nested scopes (lexical scoping).




Programmer-defined type redeclared as a variable (error code 1.2)



A type name is redeclared as a variable if the name appears first in an id list of a type decl and appears again in id list of var decl in the same scope. Using the same programmer defined name for a variable and a type is allowed in non-overlapping scopes and in nested scopes (lexical scoping)




Programmer-defined type used as variable (error code 1.3)



If resolving a variable name returns a type declaration, the type is used as a variable.




Undeclared type (error code 1.4)



If resolving a type name returns no declaration, the type is undeclared.




Errors involving variable declarations:



Variable declared more than once (error code 2.1)



An explicitly declared variable can be declared again explicitly by appearing as part of an id list in a variable declaration and resolving the name at the site of the new declaration returns the first declaration.




Programmer-defined variable redeclared as a type (error code 2.2)



A variable name is redeclared as a type if the name appears first in an id list of a var decl and appears again in id list of type decl in the same scope. Using the same programmer defined name for a variable and a type is allowed in non-overlapping scopes and in nested scopes (lexical scoping)




Variable used as a type (error code 2.3)



If the type name in a variable declaration resolves to a variable declaration, the variable is used as a type.




Undeclared variable (error code 2.4)



If resolving a variable name that appears in a statement other than a declaration returns no declaration, the variable is undeclared.




Errors involving built-in type:



Redeclaration or use of a built-in type name If a built-in type name appears in id list of a variable declaration or a type declaration or appears in a statement other than a declaration statement, your parser should output syntax error.






For each error involving variable declarations and errors involving type declarations, your type checker should output one line in the following format:










ERROR CODE <code <symbol_name







in which <code should be replaced with the proper code (see the error codes listed above) and







<symbol name should be replaced with the name of the type or variable that caused the error. Since







the test cases will have at most one error each, the order in which these error messages are printed does not matter.




5
4.2. Type Mismatch




If any of the type constraints (listed in the Type System section above) is violated in the input program, then the output of your program should be:










TYPE MISMATCH <line_number <constraint







Where <line number is replaced with the line number that the violation occurs and <constraint should be replaced with the label of the violated type constraint (possible values are C1 through C4, see section on Type System for details of each constraint). Note that you can assume that anywhere a violation can occur it will be on a single line.







4.3. No Semantic Errors




If there are no semantic errors in the program, then your program should output for each use of a programmer-defined type name and variable name, in the order in which the use appears in the pro-gram, the line number in which the use appears and the line number of the declaration to which the use of the name resolves. For this part, the format should be the following










<name_used_1 <line_no_use_1 <line_no_declared_1 <name_used_2 <line_no_use_2 <line_no_declared_2 ...







Where <name use i is the name of a variable or a type and corresponds to i’th name use in the pro-gram. <line no use i is the line number in which the i’th use appears and <line no declared i is the line number of the declaration corresponding to the i’th use.







Examples



Given the following example (the line numbers are not part of the input):










01 {




TYPE a, b, c, b : INT;



VAR x : b;



VAR y : c;



y = x;



}






The output will be the following:










ERROR CODE 1.1 b


































6



Given the following example (the line numbers are not part of the input):










01 {




TYPE a : INT;



VAR x : INT;



VAR b, a : STRING;



x = 10;



}






The output will be the following:










ERROR CODE 1.2 a







Given the following example (the line numbers are not part of the input):










01 {




02 VAR x : INT;




VAR y, x : STRING;



x = 10;



}






The output will be the following:










ERROR CODE 2.1 x







Given the following example (the line numbers are not part of the input):










01 {




04 VAR x : INT;




TYPE y, x : STRING;



x = 10;



}






The output will be the following:










ERROR CODE 2.2 x







Given the following example (the line numbers are not part of the input):










01 {




02 VAR x1 : INT;




VAR y, x2 : STRING;



y = x1;



}









7



The output will be the following:










TYPE MISMATCH 4 C1







Given the following example (the line numbers are not part of the input):










01 {




02 VAR




x : INT;



VAR y : REAL;



WHILE x < 10



{



x = x + y;



}



}






The output will be the following:










TYPE MISMATCH 7 C2







Given the following example (the line numbers are not part of the input):










01 {




TYPE t : INT;



VAR a, b : t;



04
{
VAR a
: INT;
05


WHILE
a b
06


{


07




a = a + b;
08


}





}



}






The output will be the following:










t 3 2




a 5 4




b 5 3




a 7 4




a 7 4




b 7 3

























8
Evaluation






Your submission will be graded on passing the automated test cases.




The test cases (there will be multiple test cases in each category, each with equal weight) will be broken down in the following way (out of 100 points):




Parsing: 35 points



Errors involving programmer-defined types (error codes 1.x): 15 points



Errors involving variable declarations (error codes 2.x): 20 points



Type mismatch errors and no semantic error cases: 30 points



The parsing category is not partially graded, you need to pass all test cases in that category to get the 35 points. All other categories are partially graded.






































































































































































9

More products