$24
Important Note
Please do not start the assignment before reading the notes in HAND IN section (last page):
Question 1 (10 points)
Give the prefix, infix, and postfix expressions obtained by preorder, inorder, and postorder traversals, respectively, for the expression tree below:
Question 2 (10 points)
Draw the initially empty Binary Search Tree after operations as follows (show all intermediate steps):
insert 58, 40, 49, 21, 95, 75, 150, 94, 30, 80, 67, 12, 77, 100; then delete 67, 30, 58.
Question 3, Programming Assignment (60 points)
You are to write a C++ program to count the frequency (number of occurrences) of n-grams in a text file. Definition of n-gram is simple: it is the number of consecutive letters in a given text. For example, for the word bilkent the 2-grams (bigrams) are bi, il, lk, ke, en, nt. You may ignore any capitalizations and assume that the text file contains only English letters 'a'...'z', 'A'…'Z', and the blank space to separate words. Your program should take the value of n as a parameter and construct the corresponding BST accordingly. While processing the input text, if your program encounters a word that has length smaller than the value of parameter n, you can simply ignore that word and process following words.
1
You are to use a pointer based implementation of a Binary Search Tree (BST) to store the n-grams and their counts. (You can use the source codes available in the course book as well as you can implement a BST yourself.) Each node object is to maintain the associated n-gram as a string, its current count as an integer, and left and right child pointers. On top of the regular operations that a BST has, you must implement the following functions:
• addNgram: adds the specified n-gram in the BST if not already there; otherwise, it simply increments its count.
• generateTree: reads the input text and generates a BST of n-grams. In this function, you should detect all of the n-grams in the input text and add them to the tree by using the addNgram function. This function also requires the parameter n.
• getTotalNgramCount: recursively computes and returns the total number of n-grams currently stored in the tree.
• isComplete: computes and returns whether or not the current tree is a complete tree.
• isFull: computes and returns whether or not the current tree is a full tree.
• operator<<: recursively prints each n-gram in the tree in alphabetical order along with their frequencies. This should be a global function.
Below is the interface of an NgramTree class for implementing the above functionality as well as a main function to test it with a sample input text file. These will be used for evaluation purposes. Make sure your code runs correctly against these. We will test your program extensively.
// NgramTree.h
class NgramTree {
public:
NgramTree();
~NgramTree();
void addNgram( string ngram );
int getTotalNgramCount();
bool isComplete();
bool isFull();
void generateTree( string fileName, int n );
private:
// ...
friend ostream& operator<<( ostream& out, NgramTree tree );
};
2
• hw2.cpp #include <iostream> #include <string>
using namespace std;
#include "NgramTree.h"
int main( int argc, char **argv ) {
NgramTree tree;
string fileName( argv[1] );
int n = atoi( argv[2] );
tree.generateTree( fileName, n );
cout << "\nTotal " << n << "-gram count: " << tree.getTotalNgramCount() << endl; cout << tree << endl;
cout << n << "-gram tree is complete: " << (tree.isComplete() ? "Yes" : "No") <<
// Before insertion of new n-grams
cout << "\nTotal " << n << "-gram count: " << tree.getTotalNgramCount() << endl;
tree.addNgram( "samp" );
tree.addNgram( "samp" );
tree.addNgram( "zinc" );
tree.addNgram( "aatt" );
cout << "\nTotal " << n << "-gram count: " << tree.getTotalNgramCount() << endl; cout << tree << endl;
cout << n << "-gram tree is complete: " << (tree.isComplete() ? "Yes" : "No") <<
cout << n << "-gram tree is full: " << (tree.isFull() ? "Yes" : "No") << endl;
return 0;
}
3
// input.txt
this is sample text
and thise is all
// Sample output
Total 4-gram count: 6
"ampl" appears 1 time(s)
"hise" appears 1 time(s)
"mple" appears 1 time(s)
"samp" appears 1 time(s)
"text" appears 1 time(s)
"this" appears 2 time(s)
4-gram tree is complete: No
4-gram tree is full: No
Total 4-gram count: 8
"aatt" appears 1 time(s)
"ampl" appears 1 time(s)
"hise" appears 1 time(s)
"mple" appears 1 time(s)
"samp" appears 3 time(s)
"text" appears 1 time(s)
"this" appears 2 time(s)
"zinc" appears 1 time(s)
4- gram tree is complete: No
4-gram tree is full: No
The following is the BST constructed for the input text when n = 4 .
4
Question 4 (20 points)
Analyze the worst-case running time complexities of the addNgram and operator<< functions in the previous question using the big-O notation.
HAND IN
• You should prepare the answers of Questions 1, 2, and 4 using a word processor (in other words, do not submit images of handwritten answers).
• This homework will be graded by Fatma Kahveci. Contact your TA Nogay Evirgen directly for any homework related questions.
• Before 23:55 of Nov 11, 2021, upload your solutions using Moodle. You should upload a single zip file that contains
◦ hw2.pdf, the file containing the answers to Questions 1, 2 and 4, and the sample output of the program.
◦ NgramTree.cpp, NgramTree.h, and hw2.cpp, and any additional files if you wrote additional classes in your solution, and
◦ readme.txt, the file containing anything important on the compilation and execution of your program in Question 3.
◦ You should be able to compile your program on a Linux terminal with the following command: g++ *.cpp –o hw2
◦ Do not forget to put your name and student id in all of these files. Well comment your implementation. Add a header comment to the beginning of each file as follows:
/**
▪ Title: Trees
▪ Author: Firstname LastName
▪ ID: 21000000
▪ Assignment: 2
▪ Description: description of your code
*/
◦ Do not put any unnecessary files such as the auxiliary files generated from your favorite IDE.
◦ Name your zip file as follows: name_surname_hw2.zip; any violation of these will cause a significant point deduction from your grade.
◦ Keep all the files before you receive your grade.
• For this assignment, you must use your own implementation of binary search trees. In other words, you cannot use any existing binary search tree code from other sources such as the binary search tree class in the C++ standard template library (STL). However, you can adapt the binary search tree codes in the
5
Carrano (7th ed.) or Carrano-Henry (6th ed.) book and/or our lecture slides. You will get no points if you do not use binary search trees as indicated.
IMPORTANT:
• Although you may use any platform or any operating system to implement your algorithms and obtain your experimental results, your code should work on the dijkstra server (dijkstra.ug.bcc.bilkent.edu.tr). We will compile and test your programs on that server. If your C++ code does not compile or execute in that server, you will lose points.
• Attention: For this assignment, you are allowed to use the codes given in our textbook and/or our lecture slides. However, you ARE NOT ALLOWED to use any codes from other sources (including the codes given in other textbooks, found on the Internet, belonging to your classmates, etc.). Furthermore, you ARE NOT ALLOWED to use any data structure or algorithm related function from the C++ standard template library (STL).
Do not forget that plagiarism and cheating will be heavily punished. Please do the homework yourself.
6