$24
Summary
The objective of this assignment is to write a type checker for a fragment of the C++ programming language. The type checker should return an "OK" at success, and report a type error at failure.
The recommended implementation is via a BNF grammar processed by BNFC. The grammar is explained in this book, in Section 2.10. The fragment of C++ covered is smaller than in Assignment 1. and does not really include any C++ specific features not available in C, except for the treatment of strings. You can take the grammar from here
The type system is partially characterized by formal rules in Chapter 4.
Method
First build the grammar or download it from the book website. In the type checker itself, the recommended procedure is two passes:
build a symbol table with all function types
type check and annotate the code by using this symbol table
Only the five built-in types
int double bool string void
are taken into account. Every expression has one of these types.
Types of functions in the symbol table can be represented in any way that stores their argument and return types. For instance, the function header
int f (double x, bool b)
can create a symbol table entry
f |- ([double, bool], int)
mapping the name f to a pair whose first component is the list of argument types and the second component is the return type.
Language and type system specification
Programs
A program is a sequence of definitions.
A program may also contain comments and preprocessor directives, which are just ignored by the parser (see Section 2.10).
Function definitions
A function definition has a type, a name, an argument list, and a body. Example:
int foo(double x, int y)
{
return y + 9 ;
}
An argument list is a comma-separated list of argument declarations. It is enclosed in parentheses ( and ).
An argument declaration has a type and an identifier, for instance
int x
Notice that argument declarations with multiple variables (int
x, y) are not included. A declaration that occurs as a statement (as shown below), can also have multiple variables. But it must have at least one variable.
A function body is a list of statements enclosed in curly brackets { and } .
Typing. The same function name may be used in at most one function definition. An argument list may only use each variable once.
All return statements in a function body must return an expression whose type is the return type of the function. You don't need to check that there actually is a return statement (you can do this optionally).
Statements
Any expression followed by a semicolon ; can be used as a statement.
Any declaration followed by a semicolon ; can be used as a statement. Declarations have one of the following formats:
type and one variable (as in function parameter lists):
int i ;
type and many variables
int i, j ;
type and one initialized variable
int i = 6 ;
Typing. The initializing expression must have the declared type.
Statements returning an expression, for example
return i + 9 ;
Typing. The type of the returned expression must be the same as the return type of the function in which it occurs.
While loops, with an expression in parentheses followed by a statement, for example:
while (i < 10) ++i ;
Typing. The expression must have type bool.
Conditionals: if with an expression in parentheses followed by a statement, else, and another statement. Examples:
if (x 0) return x ; else return y ;
Typing. The expression must have type bool.
Blocks: any list of statements (including empty list) between curly brackets. For instance,
{
int i = 2 ;
{
}
i++ ;
}
Typing. A variable may only be declared once on the same block level. The parameters of a function have the same level as the top-level block in the body.
Expressions
The following table gives the expressions and their precedence levels. Infix operators are assumed to be left-associative. The arguments in a function call can be expressions of any level. Otherwise, the subexpressions are assumed to be one precedence level above of the main expression.
Note. The table is not guaranteed to be exactly the same as in the C++ standard.
level
expression forms
explanation
type
16
literal
literal
literal type
16
identifier
variable
declared type
15
f(e,...,e)
function call
return type
14
v++, v--
in/decrement
(sugar)
13
++v, --v
in/decrement
(sugar)
12
e*e, e/e
mult, div
operand type (int or double)
11
e+e
add, sub
operand type (int, double or string)
11
e-e
add, sub
operand type (int or double)
9
e<e, ee, e=e, e<=e
comparison
bool
8
e==e, e!=e
(in)equality
bool
4
e&&e
conjunction
bool
3
e||e
disjunction
bool
2
v=e
assignment
type of LHS
Literals include integer literals, floating point literals, and string literals. There are also two boolean literals, true and false.
Typing. Integer, double, string, and boolean literals have types int, double, string, and bool, respectively.
Variables have the type declared in the nearest enclosing block. A variable must be declared before it is used in an expression.
The arguments of a function call must have types corresponding to the argument types of the called function. The number of arguments must be the same as in the function declaration (thus the C++ default argument rule is not applied). Notice that only identifiers are used as functions.
Increments and decrements are assumed only to apply to variables.
Comparison and (in)equality apply to two operands of the same type, which is int, double, string, or bool.
Conjunction and disjunction apply to operands of type bool only.
Assignments have the same type as the left-hand-side variable. Notice that only variables are used as left-hand-sides.
Identifiers
An identifier is a letter followed by a list of letters, digits, and underscores.
Comments
There are three kinds of comments.
anything between tokens /* and */
anything from token // to the end of a line or the file
anything from token # to the end of a line or the file (preprocessor directive)
Comments cannot be nested.
Input and output
The type checker must be a program called tccpp, which is executed by the command
tccpp <SourceFile
and prints its output to the standard output. The output at success must be just the line OK.
The output at failure must be the line TYPE
ERROR followed by a message indicating the first error. For syntax errors, the line must say SYNTAX
ERROR.
The easiest way to produce this format is to use the ready-made files in either of
Haskell package
Java package
Here is an example of success:
Source file
// file good.cc
int factr (int n)
{
if (n<2)
return 1 ;
else
return (n * factr(n-1)) ;
}
int main ()
{
int i = factr(7) ;
return 0 ;
}
Running the type checker
% tccpp good.cc
OK
Here is an example of failure:
Source file
// file bad.cc
int f (double c)
{
int n = 1 ;
while(c) ++n ;
}
Running the type checker
% tccpp bad.cc
TYPE ERROR
condition c in while: expected bool, found double
Compiling the type checker
The type checker is submitted as a tar package, which contains a Makefile such that typing
make
produces the executable program tccpp in a normal Unix environment. The standard compilers and parser and lexer tools can be assumed to be a part of this environment and should not be included in the tar package.
The easiest way to produce this behaviour is to use the ready-made files in either the Haskell package or the Java package.
A hint on file structure
The simplest way of producing the type checker is to use the BNFC-generated test file as a template. Just modify its main function so that, instead of the parse tree, it prints the type checker output specified above.
We recommend the use of either the Haskell package or the Java package. In these packages, you only have to modify two file:
CPP.cf: the grammar
TypeChecker.hs or TypeChecker.java, depending on your implementation language
Also see Section 4.9--4.11 for what the code can look like.