$24
Objectives: Programming
Write a C++ class to create a binary search tree. Your program should empirically calculate the average search cost for each node in a tree and output a tree, level by level, in a text format.
1. Programming (80 points).
(a) (35 points) Write a class Binary Search Tree – include copy and move constructors, copy and move assignment operators.
(b) (25 points) Populate a binary search tree according to the instructions provided below:
▪ Read integer data from the file designated by the user. Every line of the file contains one unique integer number (no duplicates). The size of input file is not given and the termination of input data is done by detecting the end-of-file.
▪ Print the input data on the screen for small input size.
▪ Create a binary search tree using the input data. Use the definition of a node and binary search tree provided in the header file.
▪ Calculate the search cost for each node when it is inserted in a binary search tree node.
– A node element has at least two data fields: key and search_cost.
– For each node, count and store the number of comparisons required for searching a node (i.e. the number of comparisons = the search cost for the node = 1+ the depth of the node).
(c) (20 points) Application
▪ Print the tree data by performing in-order traversal operation. If the number of nodes is less than 16, print on the screen the resulting tree along with the information about the nodes (for each node, print its key and search cost). Otherwise, print this tree and information about its nodes into a file. See the example below.
▪ Print the binary search tree on the screen level-by-level in the case when the number of nodes is less than or equal to 16 along with the information about the nodes (for each node, print its key and search cost). For trees larger than 16, do not print to a file as the file may be large, see the example at 2.
▪ Calculate the average search cost for all nodes in the tree and print it.
▪ Sum up the search cost over all the nodes in your tree as you traverse the tree by the in-order traversal.
▪ Divide the sum by the total number of nodes in the tree, and the result is the average search cost for all the tree nodes
▪ Print the total number of nodes
2. Example: Input data:
3
5
7
9
10
11
Create a binary search tree and provide information about each node when you display the tree.
2
Key Search Time
• 1
• 2
• 2
• 3
10 3
11 4
Total number of nodes is 6
Use the in-order traversal to output a tree on the screen or to a file. Each node is printed in the
following format:
Key[SearchCost], that is,
3[2] 5[1] 7[3] 9[2] 10[3] 11[4]
Sum of the search cost over all the nodes in the tree is: 2 + 1 + 3 + 2+3 + 4 = 15. Average search cost:
15/6 = 2.5.
Average search cost is 2.5
Output the tree level-by-level to a file (missing elements are denoted by X):
5[1]
3[2] 9[2]
X X 7[3] 10[3]
XXXXXXX11[4]
3. Download files containing integer data from the course website.
(a) The files 1p, 2p, ..., 12p contain 21 − 1, 22 − 1,..., and 212 − 1 integers respectively. The integers make 12 perfect binary trees where all leaves are at the same depth. Calculate and record the average search cost for each perfect binary tree.
(b) The files 1r, 2 r, ..., 12r contain same number of integers as above. The integers are randomly ordered and make 12 random binary trees. Calculate and record the average search cost for each tree.
(c) The files 1l, 2l, ..., 12l contain same number of integers as above. The integers are in increasing order and make 12 linear binary trees. Calculate and record the average search cost for each tree.
(d) Make a table and a plot showing the average search cost (y-axis) versus the number of tree nodes (x-axis) of all perfect binary trees, random binary trees and linear binary trees.
(e) Provide the output in the text format for the files 1p–4p, 1r–4r, and 1l–4l. Another option for tree drawing is to output elements under text mode using a queue. The nodes will be printed according to their depth level. Missing nodes will be represented by the symbol X. A possible solution is to fill the missing nodes with fake nodes in the data structure when printing the tree. Example:
5[1]
3[2] 9[2]
X X 7[3] 10[3]
XXXXXXX11[4]
4. Hints
(a) Besides using links/pointers to represent a binary search tree, you may store the binary tree in a vector. This implementation might be useful, especially for the printing of a tree level by level.
(b) To output a binary search tree level-by-level, you may use breadth-first-search (BFS) algorithm based on queue.
3
i. You may use the doubly linked list class, or use STL Queue
ii. The following is a pseudo-code of the BFS algorithm
output_BST_level_by_level(BinarySearchTree T) Queue q
q.enqueue(T.root)
while (not q.empty()) do TreeNode node = q.dequeue()
/* output a node (you need to determine whether to output a newline) */
print node.key
if (there are still unenqueued non-nullptr nodes in the tree) then
/* enqueue children here for later output */ endif
done
Report (20 points)
Write a brief report that includes the following:
1. A description of the assignment objective, how to compile and run your programs, and an explanation of your program structure (i.e. a description of the classes you use, the relationship between the classes, and the functions or classes in addition to those in the lecture notes).
2. A brief description of the data structure you create (i.e. a theoretical definition of the data structure and the actual data arrangement in the classes).
3. A description of how you implemented the calculation of (a) individual search cost and (b) average search cost and explain which tree operation (e.g. find, insert) was helpful. Analyze the time complexity of (a) calculating individual search cost and (b) summing up the search costs over all the nodes.
4. Give an individual search cost in terms of n using big-O notation. Analyze and give the average search costs of a perfect binary tree and a linear binary tree using big-O notation, assuming that the following formulas are true (n denotes the total number of input data).
Pd=0
−
1
2d(d + 1) ' (n + 1) · log2(n + 1) − n
and
Pd=1 d ' n(n + 1)/2
log2(n+1)
n
5. Include a table and a plot of an average search costs you obtain. In your discussions of experimental results, compare the curves of search costs with your theoretical analysis results in item 3.
4