$24
The goal of this assignment is to learn how to implement ADTs in C. We will discuss the typedef and struct commands, header files, information hiding, constructors and destructors, and memory management. You will write a program in C that recreates the Dictionary ADT from pa3.
Creating New Data Types in C
The struct keyword is used to create a new aggregate data type called a structure or just struct, which is the closest thing C has to Java's class construct. Structs contain data fields, but no methods, unlike Java classes. A struct can also be thought of as a generalization of an array. An array is a contiguous set of memory areas all storing the same type of data, whereas a struct may be composed of different types of data. The general form of a struct declaration is
struct structure_tag{
// data field declarations
};
Note the semicolon after the closing brace. For example
struct person{
int age;
int height;
char first[20];
char last[20];
};
The above code does not allocate any memory however, and in fact does even not create a complete type called person. The term person is only a tag which can be used with struct to declare variables of the new type.
struct person fred;
After this declaration fred is a local variable of type struct person, i.e. a symbolic name for an area of stack memory storing a person structure. By comparison, a Java reference variable is a pointer to heap memory. As we shall see, it is possible and desirable to declare C structures from heap memory as well. The variable fred contains four components, which can be accessed via the component selection (dot ".") operator:
fred.age = 27;
fred.height = 70;
strcpy(fred.first, "Fredrick");
strcpy(fred.last, "Flintstone");
See the man pages or google to learn more about strcpy() in the library string.h.
The struct command is most often used in conjunction with typedef, which establishes an alias for an existing data type. The general form of a typedef statement is:
typedef existing_type new_type;
For instance
1
typedef int feet;
defines feet to be an alias for int. We can then declare variables of type feet by doing
feet x = 32;
Using typedef together with struct allows us to declare variables of the structure type without having to include struct in the declaration. The general form of this combined typedef statement is:
typedef struct structure_tag{
/* data field declarations */
} new_type;
The structure_tag is only necessary when one of the data fields is itself of the new type, and can otherwise be omitted. Often the tag is included simply as a matter of convention. Also by convention structure_tag and new_type are the same identifier, since there is no reason for them to differ. Going back to the person example above we have
typedef struct person{
int age;
int height;
char first[20];
char last[20];
} person;
We can now declare
person fred;
and assign values to the data fields of fred as before. It is important to remember that the typedef statement itself allocates no memory, only the declaration does. To allocate a person structure from heap memory, we do
person* pFred = malloc(sizeof(person));
The variable pFred points to a person structure on the heap. Note that pFred itself is a symbolic name for an area of stack memory storing the address of a block of heap memory storing a person structure. This is essentially the situation one has in java when declaring a reference variable of some class type. To access the components of the person structure pointed to by pFred, we must first dereference (i.e. follow) the pointer using the indirection (value-at) operator *. Unfortunately the expression *pFred.first is not valid since the component selection (dot ".") operator has higher precedence than value-at *. We could insert parentheses to get (*pFred).first, but this leads to some unwieldy expressions. Fortunately C provides a single operator combining the value-at and dot operators called the indirect component selection (arrow -) operator. Note this operator is represented by two characters with no separating space. To assign values to the components of the person pointed to by pFred, do
pFred-age = 27;
pFred-height = 70;
strcpy(pFred-first, "Fredrick");
strcpy(pFred-last, "Flintstone");
2
Thus the C operator that is equivalent to the familiar dot operator in Java is not component selection (dot "."), but indirect component selection (arrow "-"). The following example defines a new data type called NodeObj that has a pointer to NodeObj as one of its members.
typedef struct NodeObj{
int item;
struct NodeObj* next;
} NodeObj;
In this case the NodeObj tag is necessary since the definition itself refers to NodeObj. Observe however that within the body of the structure definition, NodeObj is referred to as struct NodeObj since the typedef statement is not yet complete. Outside the structure definition we can simply use NodeObj as a new type name. Another typedef statement defines Node as being a pointer to NodeObj.
typedef NodeObj* Node;
To declare and initialize a reference (pointer) to a NodeObj we do
Node N = malloc(sizeof(NodeObj));
N-item = 5;
N-next = NULL;
Two rules to remember when using structures to define ADTs in C: (1) always use typedef and struct together as in the last example do define new structure types and pointers to those types, and (2) always declare your structure variables as pointers to heap memory and access their components via the arrow operator. Do not declare structure variables from stack memory. Our goal here is to emulate as closely as possible the class construct as it appears in the Java language.
Information Hiding
The C language does not include access modifiers such as Java's public and private keywords. To enforce the principle of information hiding we split the definition of an ADT into two files called the header file (with suffix .h), and the implementation file (with suffix .c). The header file constitutes the ADT interface, and is roughly equivalent to a Java interface file. It contains the prototypes of all public ADT operations together with typedef statements defining exported types. One of the exported types in the header file is a pointer (also called a handle or reference) to a structure that encapsulates the data fields of the ADT. The definition of that structure is placed in the implementation (.c) file, along with definitions of any private types, together with function definitions, both public and private. The implementation (.c) file will #include its own header (.h) file. ADT operations are defined to take in and return references to the structure type, exactly as is the case in Java.
A client module will then #include the header (.h) file giving it the ability to declare variables of the reference type, as well as functions that either take or return reference type parameters. The client cannot dereference this pointer however, since the structure it points to is not defined in the header file. The ADT operations take reference arguments, so the client does not need to (and is in fact unable to) directly access the structure these references point to. The client can therefore interact with the ADT only through the public ADT operations and is prevented from accessing the interior of the so called 'black box'. This is how information hiding is accomplished in C. One establishes a function as public by including its prototype in the header file, and private by leaving it out of the header file. Likewise for the reference types belonging to an ADT.
3
We illustrate with the following C implementation of an IntegerStack based on a linked list. This example has been abridged to save space. The unabridged version is posted on the webpage.
//-----------------------------------------------------------------------------
// IntegerStack.h
// Header file for the IntegerStack ADT
//-----------------------------------------------------------------------------
#ifndef _INTEGER_STACK_H_INCLUDE_
#define _INTEGER_STACK_H_INCLUDE_
Stack
Exported reference type typedef struct StackObj* Stack;
newStack()
constructor for the Stack type Stack newStack(void);
freeStack()
destructor for the Stack type void freeStack(Stack* pS);
//-----------------------------------------------------------------------------
// prototypes of ADT operations deleted to save space, see webpage
//-----------------------------------------------------------------------------
// printStack()
// prints a text representation of S to the file pointed to by out // pre: none
void printStack(FILE* out, Stack S);
#endif
This file contains some preprocessor commands we haven't yet seen for conditional compilation, namely #ifndef and #endif. If the C compiler encounters multiple definitions of the same type, or multiple prototypes of any function, it is considered a syntax error. Therefore when a program contains several files, each of which may #include the same header file, it is necessary to place the content of the header file within a conditionally compiled block (sometimes called an “include guard”), so that the prototypes etc. are seen by the compiler only once. The general form of such a block is
#ifndef _MACRO_NAME_
#define _MACRO_NAME_
statements
#endif
If _MACRO_NAME_ is undefined then the statements between #ifndef and #endif are compiled. Otherwise these statements are skipped. The first operation in the block is to #define _MACRO_NAME_. Notice that the macro is not defined to be anything, it just needs to be defined. It is customary to choose _MACRO_NAME_ in such a way that it is unlikely to conflict with any "legitimate" macros. Therefore the name usually begins and ends with an underscore _ character.
The next item in IntegerStack.h is the typedef command defining Stack to be a pointer to struct StackObj. The definition of struct StackObj, which contains the data fields for the IntegerStack ADT, will be placed in the implementation file. Next are prototypes for the constructor newStack() and destructor freeStack(), followed by prototypes of ADT operations (skipped in this document, but given on the class
4
webpage.) Finally a prototype is included for a function printStack() corresponding roughly to the toString() method in java. An abridged version of the implementation file follows.
//-----------------------------------------------------------------------------
// IntegerStack.c
// Implementation file for IntegerStack ADT
//-----------------------------------------------------------------------------
#include<stdio.h
#include<stdlib.h
#include<string.h
#include<assert.h
#include"IntegerStack.h"
// private types --------------------------------------------------------------
// NodeObj
typedef struct NodeObj{
int item;
struct NodeObj* next;
} NodeObj;
// Node
typedef NodeObj* Node;
newNode()
constructor of the Node type Node newNode(int x) {
Node N = malloc(sizeof(NodeObj)); assert(N!=NULL);
N-item = x; N-next = NULL; return(N);
}
freeNode()
destructor for the Node type
void freeNode(Node* pN){
if( pN!=NULL && *pN!=NULL ){
free(*pN);
*pN = NULL;
}
}
// StackObj
typedef struct StackObj{
Node top;
int numItems;
} StackObj;
// public functions -----------------------------------------------------------
// newStack()
// constructor for the Stack type
Stack newStack(void){
Stack S = malloc(sizeof(StackObj));
assert(S!=NULL);
S-top = NULL;
S-numItems = 0;
return S;
}
5
freeStack()
destructor for the Stack type void freeStack(Stack* pS){
if( pS!=NULL && *pS!=NULL ){
if( !isEmpty(*pS) ) popAll(*pS); free(*pS);
*pS = NULL;
}
}
//-----------------------------------------------------------------------------
// definitions of ADT operations deleted to save space, see webpage
//-----------------------------------------------------------------------------
// printStack()
// prints a text representation of S to the file pointed to by out // pre: none
void printStack(FILE* out, Stack S){
Node N;
if( S==NULL ){
fprintf(stderr,
"Stack Error: calling printStack() on NULL Stack reference\n"); exit(EXIT_FAILURE);
}
for(N=S-top; N!=NULL; N=N-next) fprintf(out, "%d ", N-item);
fprintf(out, "\n");
}
This implementation file defines several private types, namely NodeObj, Node and StackObj. Type Node is a pointer to NodeObj, which is the basic building block for a linked list. Type StackObj encapsulates the data fields for a stack. Recall that type Stack was defined in the header file to be a pointer to the structure struct StackObj. Type Stack is the reference through which the client interacts with ADT operations.
Memory Management
Each of the structure types defined in the above example have their own constructor that allocates heap memory and initializes data fields, as well as a destructor that balances calls to malloc() and calloc() in the constructor with corresponding calls to free(). Observe that the arguments to freeNode() and freeStack() are not the reference types Node and Stack, but are instead pointers to these types. The reason for this is that the destructor must alter, not just the object a reference points to, but also the reference itself by setting it to NULL. Why must the destructor do this? Recall that maintaining a pointer to a free block on the heap is a major memory error in C. The responsibility for setting such pointers safely to NULL should lie with the ADT module, not the with client module. To accomplish this reference types are themselves passed by reference to their destructor.
As in Java, all ADT operations should check their own preconditions and exit with a useful error massage when one is violated. This message should state the module and function in which the error occurred, and exactly which precondition was violated. The purpose of this message is to provide diagnostic assistance to the designer of the client module. In C however there is one more item to check. Every ADT operation should verify that the reference that is its main argument is not NULL. This check should come before the checks of preconditions since any attempt to dereference a NULL handle will result in a segmentation fault. The reason this was not necessary in java was because calling an instance method on a null reference variable causes a NullPointerException to be thrown, which provides some error tracking to the client programmer.
Naming Conventions
6
Suppose you are designing an ADT in C called Blah. Then the header file will be called Blah.h and should define a reference type Blah that points to a structure type BlahObj.
typedef struct BlahObj* Blah;
The header file should also contain prototypes for ADT operations. The implementation file will be called Blah.c and should contain the statement
typedef struct BlahObj{
data fields for the Blah ADT } BlahObj;
together with constructors and destructors for the BlahObj structure. File Blah.c will also contain definitions of all public functions (i.e. those with prototypes in Blah.h) as well as definitions of private types and functions. The general form for the constructor and destructor (respectively) are
Blah newBlah(arg_list){
Blah B = malloc(sizeof(BlahObj));
assert( B!= NULL );
initialize the fields of the Blah structure return B;
}
and
void freeBlah(Blah* pB){
if( pB!=NULL && *pB!=NULL){
free all heap memory associated with *pB free(*pB);
*pB = NULL;
}
}
Again note that the destructor passes its Blah argument by reference, so it can set this pointer to NULL. Given a Blah variable B a call to freeBlah() would look like
freeBlah(&B);
The general form for an ADT operation is
return_type some_op(Blah B, other_parameters){ if( B==NULL ){
fprintf(stderr, "Blah Error: some_op() called on NULL Blah reference\n"); exit(EXIT_FAILURE);
}
check preconditions
do whatever some_op() is supposed to do
}
Most ADTs should also contain a printBlah() function that prints a text representation of a Blah object to a file stream. This function is roughly equivalent to the toString() function in Java.
7
void printBlah(FILE* out, Blah B){
if( B==NULL ){
fprintf(stderr,
"Blah Error: printBlah() called on NULL Blah reference\n"); exit(EXIT_FAILURE);
}
// calls to fprintf(out, text_representation_of_B)
}
What to Turn In
Study the unabridged version of the IntegerStack ADT on the webpage. It is recommended that you use it as a starting point for your Dictionary ADT in C. The Dictionary in this assignment is largely the same that in pa3, so re-familiarize yourself with the specifications for that assignment. As before the elements of the Dictionary will be pairs of strings called key and value respectively. Recall however that strings in C are represented by a null ('\0') terminated char array, rather than a built in data type as in Java. The C standard library string.h provides functions for operating on these primitive strings. See documentation for string.h at
http://www.acm.uiuc.edu/webmonkeys/book/c_guide/
and at
http://en.cppreference.com/w/c/string/byte
to learn about these functions, paying special attention to strcmp(), strcpy(), and strlen(). Function strcmp() is of particular importance in this assignment since it provides a simple way to tell if two strings are equal. Specifically, the expression strcmp(str1, str2)==0 returns true (=1) if the char array pointed to by str1 is the same sequence as that pointed to by str2, and false (=0) otherwise. Use this fact when writing the private function findKey().
The Node type for the underlying linked list should contain two data fields, each of type char*, representing the two strings key and value. The ADT operations are identical to those in pa3, except of course that certain return types and formal parameters are now char* rather than String. One other difference is that you must write a destructor for the Dictionary ADT, where none was necessary in pa3. You will also write a function called printDictionary() replacing the Java toString() method. Its output should be formatted to look exactly like that of toString().
The interface for the Dictionary ADT is embodied in the file Dictionary.h posted on the webpage. A test client called DictionaryClient.c is also included. Submit both of these files unaltered with your project. The webpage also contains a Makefile for the Stack ADT. Alter this Makefile to make the executable DictionaryClient from the source DictionaryClient.c. (Just replace "IntegerStack" everywhere it is
found with "Dictionary".) Compare the output of DictionaryClient with the file DictionaryClientOut. Note that this test client and its output are provided as a convenience to you and should not be deemed to certify your Dictionary.c as error free. For that you will need to construct your own tests in a file called DictionaryTest.c which you will also submit. Note that the Makefile provided includes clean and check utilities which you should leave in place. Do make clean to delete old binaries and make check to check your Dictionary.c for memory leaks.
Submit the files:
README
8
Makefile
Dictionary.c
DictionaryTest.c
Dictionary.h
DictionaryClient.c
to the assignment name lab5. As always start early and ask for help if anything is not completely clear.
9