$24
About This Assignment
This assignment introduces another data structure, related to a binary search tree, that can be used to provide an efficient implementation of a Dictionary. It can be completed by groups of up to three students and is due by 11:59pm on Monday, June 3.
2-3 Trees: Definition
Suppose that E is an ordered type, that is, a nonempty set of values that have a total order. A 2-3-tree, for type E, is a finite rooted tree T (like a binary search tree or a red-black tree) that satisfies the following 2-3 Tree Properties:
Every leaf in T distinct. Thus T leaves of T .
stores an element of E. and the elements stored at the leaves of T are represents a finite subset of E, namely, the set of values stored in the
Every internal node of T has (exactly) either two or three children — which are each either leaves or internal nodes of T .
If an internal node x has exactly two children — a first child , which is the root of a first subtree of the subtree with root x, and a second child , which is the root of a second subtree of the subtree with root x — then each of the values stored at the leaves of the first subtree is less than each of the values stored at the leaves of the second subtree.
If an internal node x has exactly three children — a first child , second child and third child , which are the roots of the first subtree, second subtree and third subtree of the subtree with root x, respectively — then each of the values stored at the leaves of the first subtree is less than each of the values stored at the leaves of the second subtree, and
1
(3, 9, 20)
(1, 3)
(5, 9)
(10, 16, 20)
1
3
5
9
10
16
20
Figure 1: A 2-3 Tree
each of the values stored at the leaves of the second subtree is less than each of the values stores at the leaves of the third subtree.
If an internal node x of T has exactly two children then x stores a 2-tuple (M1, M2) — where M1 is the largest value stored at any leaf of the first subtree, and M2 is the largest value stored at any leaf of the second subtree of the subtree with root x.
If an internal node x of T has exactly three children then x stores a 3-tuple (M1, M2, M3)
— where M1 is the largest value stored at any leaf of the first subtree, M2 is the largest value stored at any leaf of the second subtree, and M3 is the largest value stored at any leaf of the third subtree of the subtree with root x.
Every leaf of T is at the same level of the tree — that is, the number of edges from the root down to each leaf is the same.
Each node x of T is the root of a 2-3 tree as well. That is, the subtree with root x also satisfies properties (a)–(g), for every node x in T .
For example, the tree shown in Figure 1 is a 2-3 tree representing the set of integers
{1, 3, 5, 9, 10, 16, 20}.
Exercise: Confirm that this tree satisfies the “2-3 Tree Properties” that have now been given.
Note: While a variety of references (that are available online) describe “2-3 Trees”, they do not all describe the same kind of tree and, in particular, they do not necessarily describe the kind of tree that is to be considered in this assignment!
It will be useful to measure costs of operations on 2-3 trees in terms of the sizes of the sets of elements of E that they represent, so that it will be helpful to bound the number of internal nodes in this way too.
2
Let T be a nonempty 2-3 tree, so that it includes at least one node. Prove that if T represents a subset S ⊆ E such that |S| = N ∈ N, then T has at most N − 1 internal nodes.
It follows that (if the uniform cost criterion is used to define this) the amount of storage space needed to represent a non-empty 2-3 tree is at most linear in the size of the subset of E that it represents.
The depth of an empty 2-3 tree will be defined to be −1, just like the depth of an empty binary search tree. Otherwise it is defined to be the number of edges in a longest path from the root to a leaf in T — which is the number of edges in any path from the root down to any leaf in T , by 2-3 Tree Property (g). Useful relationships between the depth of a 2-3 tree and the size of the subset of E it represents can now be established.
Let T be a non-empty 2-3 tree with depth D ∈ N and let N be the size of the subset of E represented by T . Prove that 2D ≤ N ≤ 3D.
Let D ∈ N and suppose that |E| ≥ 2D. Prove that there exists a non-empty 2-3 tree with depth D, representing a subset of E with size exactly 2D.
The proof used to complete Problem #3 can be modified to prove that if D ∈ N and |E| ≥ 3D then there exists a non-empty 2-3 tree with depth D, representing a subset of E with size exactly 3D, as well. Thus the bounds on sizes of a 2-3 tree, given in Problem #1, are tight. That is, they cannot be improved.
2-3 Trees: Searches
If a search algorithm is to be implemented as a method, provided by some 2-3 tree T , then it can be considered as an algorithm that solves the following problem1 .
Searching in This 2-3 Tree
Precondition:
This 2-3 tree, T , satisfies the 2-3 Tree Properties given above.
A non-null key, of type E, is given as input.
1 It is assumed here, and in the rest of the assignment, that NULL elements of E are never stored in a 2-3 tree — and that no attempts to search for, insert or delete NULL elements are ever made.
3
Postcondition:
If the key is stored at a leaf in T then the leaf storing this key is returned as output. A NoSuchElementException is thrown, otherwise.
This 2-3 tree has not been changed, so it still satisfies the 2-3 Tree Properties.
Consider the following problem (which might also be solved using a method provided by this 2-3 tree) as well.
Searching in a Subtree of This 2-3 Tree
Precondition:
This 2-3 tree, T , satisfies the 2-3 Tree Properties given above.
key is a non-null input of type E.
x is a non-null in T that is also given as input.
Postcondition:
If the key is stored in the subtree of T with root x, then the leaf (in this sub-tree) storing x is returned as output. A NoSuchElementException is thrown otherwise.
This 2-3 tree has not changed, so it still satisfies the 2-3 Tree Properties.
Suppose that get is an algorithm that correctly solves the second of these problems. Then the algorithm shown in Figure 2 correctly solves the first of these problems, assuming that T.root is a reference to the root of this 2-3 tree (so that T.root is null if and only if this is an empty tree).
It is possible to modify a corresponding algorithm, for searching in a subtree of a binary tree, to obtain a simple recursive algorithm that solves the second problem. Since this is a 2-3 tree and the input code x is not null, such an algorithm is as shown in Figure 3. It is assumed here that if x is not a leaf then x.firstChild is its first child and x.firstMax is the largest element stored in the first subtree. Similarly, if x is not a leaf then x.secondChild is its second child and x.secondMax is the largest element stored in its second subtree — and if x has three children then x.thirdChild is its third child and x.thirdMax is the largest element stored in its third subtree.
Prove that the depth of the subtree with root x is a bound function for the recursive algorithm in Figure 3.
4
node search(E key) {
if (T.root == null) {
Throw a NoSuchElementException } else {
return get(key, root)
}
}
Figure 2: An algorithm to Search in this 2-3 Tree
Prove that this algorithm correctly solves the “Searching in a Subtree of This 2-3 Tree” problem that is given above.
Now let Tget(K) be an upper bound for the number of steps used by the algorithm when the depth of the subtree with the input node x as root is K, for K ∈ N. Use the uniform cost criterion to define this.
Write a recurrence for Tget(K) as a function of K. Explain why your answer (which you should make as precise as possible) is correct.
Note: You should be able to use your recurrence to show that Tget(0) ≤ 2, Tget(1) ≤ 7, and Tget(2) ≤ 12.
Use your recurrence to give an upper bound in closed form (that is, not involving a summation or recurrence) for Tget(K) and prove that your bound is correct. Once again, this should be a function of K and you should try to make it as precise as possible.
2-3 Trees: Insertions
Consider, now, the problem of inserting an element into the set stored by a 2-3 tree. If this is to be implemented as a method supplied by this tree then it can be considered to solve the following computational problem.
Insertion into This 2-3 Tree
Precondition:
This 2-3 tree, T , satisfies the 2-3 Tree properties given above.
key is a non-null input of type E.
5
node get(E key, node x) {
if (x is a leaf) {
if (key is equal to the element stored at x) {
return x } else {
Throw a NoSuchElementException
}
} else if (x has two children) {
if (key ≤ x.firstMax) {
return get(key, x.firstChild) } else {
return get(key, x.secondChild)
}
} else {
if (key ≤ x.firstMax) {
return get(key, x.firstChild)
else if (key ≤ x.secondMax) }
return get(key, x.secondChild) } else {
return get(key, x.thirdChild)
}
}
}
Figure 3: An Incomplete Algorithm to Search in a Subtree of this 2-3 Tree
Postcondition:
(a) If the input key is not already in the subset of E represented by T then
it is added to this set (with this set being otherwise unchanged). An
ElementFoundException is thrown and the set is not changed, if the key already belongs to this subset.
(b) T satisfies the 2-3 Tree properties given above.
There are a few cases where this problem is easy, even though a node has to be added:
6
Briefly describe how to solve the above problem in the following cases.
T is an empty tree, so that its root is a null node.
T is not empty, but the set it represents has size one (so that its root is a leaf),
Suppose, for the rest of this assignment, that addFirstElement is a simple algorithm (based on your answer for part (a)) that takes an input key and can be used to complete the insertion operation when T is empty, and that addSecondElement is another algorithm (based on your answer for part (b)) that also takes an input key and can be used to complete the insertion operation when the root of T is a leaf.
A problem that arises, in other cases when 2-3 trees are modified, is that internal nodes tem-porarily have illegal numbers of children.
With that noted, consider the following Modified Tree Properties, which might be satisfied by a rooted tree T when an operation is in progress.2
T satisfies 2-3 Tree Properties (a), (c), (d), (e), (f), and (g) — but not, necessarily, 2-3 Tree Properties (b) or (h).
Every internal node of T has (exactly) either one, two, three or four children — which are each either leaves or internal nodes of T . There is at most one internal node of T that has either (exactly) one or four children — all other internal nodes of T have either exactly two children or exactly three children.
If an internal node x of T has exactly one child, then this is called a first child , which is the root of a first subtree of the subtree with root x. In this case, x stores a 1-tuple (M1)
— where M1 is the largest value stored at any leaf of the first subtree of the subtree with root x.
If an internal node x of T has exactly four children — a first child , second child , third child and fourth child , which are the roots of the first subtree, second subtree, third subtree and fourth subtree of the subtree with root x, respectively — then each of the values stored at the leaves of the first subtree is less than each of the values stored at the leaves of the second subtree, each of the values stored at the leaves of the second subtree is less than each of the values stored at the leaves of the third subtree, and each of the values stored at the leaves of the third subtree is less than each of the values stored at the leaves of the fourth subtree.
If an internal node x of T has exactly four children then x stores a 4-tuple (M1, M2, M3, M4)
— where M1 is the largest value stored at any leaf of the first subtree, M2 is the largest
As for the “2-3 Tree Properties”, some of these introduce terminology that will be used in the rest of the assignment.
7
value stored at any leaf of the second subtree, M3 is the largest value stored at any leaf of the third subtree, and M4 is the largest value stored at any leaf of the fourth subtree of the subtree with root x.
Each node x of T is the root of a rooted tree satisfying these properties as well. That is, the subtree with root x also satisfies the above properties (a)–(e), for every node x in T .
Consider the following computational problem. It might not be clear why this is useful as described here (but it will be).
Insertion into Subtree of This 2-3 Tree
Precondition:
This 2-3 tree, T , satisfies the 2-3 Tree properties given above.
key is a non-null input of type E.
x is a non-null node in T .
Either key is not stored in any leaf of T at all, or it is stored in a leaf of the subtree of T with root x. If it is not stored at any of these leaves then it is possible to add a new leaf storing this key, as a new child of one of the internal nodes in the subtree with root x (without making other changes) in such a way that T satisfies the “Modified Tree” properties.
Postcondition:
If the input key already belongs to the subset of E stored at the leaves in the subtree of T with root x, then an ElementFoundException is thrown and T is not changed.
If x is a leaf that stores an element of E that is not equal to the input key then a NoSuchElementException is thrown and T is not changed.
If x is an internal node and the input key does not (initially) belong to the subset of E stored at the leaves of the subtree of T with root x, then
the input key is added to the subset of E stored at the leaves of T — which is otherwise unchanged;
either T satisfies the 2-3 Tree properties, given above, or T satisfies the “Modified Tree” properties, given above, and x is now an internal node with four children.
Suppose, now, that insertIntoSubtree is a method that solves this computational problem.
8
void insert(E key) {
if (T.root == null) {
addFirstElement(key)
} else if (T.root is a leaf) {
addSecondElement(key) } else {
insertIntoSubtree(key, T.root)
if (T.root has four children) {
fixRoot()
}
}
}
Figure 4: An Algorithm for the “Insertion into This 2-3 Tree” Problem
Suppose that the root of T is an internal node and the insertIntoSubtree method is called with an input key ∈ E and with the root of T as input. Suppose, as well, that the input key was not initially stored at any leaf of T — and that the root of T has four children when the execution of this method ends.
Briefly describe how to modify T so that the 2-3 Tree properties are restored (that is, T is a 2-3 tree, once again) — without changing the subset of E represented by T , and using only a constant number of steps.
Suppose, now, that fixRoot is a method (with no inputs) that can be used to carry out the computation described in the above question — and that this method is only applied when T satisfies the above “Modified Tree” properties and its root is an internal node with four children.
If the addFirstElement, addSecondElement, insertIntoSubtree and fixRoot meth-ods are as described above, then an insert method that solves the “Insertion into This 2-3 Tree” problem is as shown in Figure 4 — where “ T.root” is a reference to the root of T .
The only method that has not been described is the insertIntoSubtree method that solves the above “Insertion into Subtree of This 2-3 Tree” problem, and which is used at line 5. It will be useful to describe another pair of methods that will be used by this one.
Suppose that T satisfies the above “Modified Tree” properties and x is an internal node of T , whose children are also internal nodes, such that either
– one of the children of x has four children, or
9
– there is no internal node of T with either one or four children at all, so that T is a 2-3 tree.
Suppose, as well, that if it called with input x, then the raiseSurplus method modi-fies T , without changing the subset of E represented by T or changing the position of x in T (that is, without changing the distance or path from x to the root), so that T still satisfies the above “Modified Tree” properties and either
– x has four children, or
– there is no internal node of T with either one or four children, so that T is a 2-3 tree.
One way to perform this computation is to create an ArrayList of nodes, and another ArrayList of the largest elements of E stored at the subtrees with these nodes as roots. If none of the children of x has four children then the first ArrayList will simply store these children. On the other hand, if some child of x has four children then this should be replaced by two internal nodes, that each have two children, namely, two of the four children of this child. In effect this child of x should be split into a pair of internal nodes, increasing the number of children of x by one.
With a bit of work, it can be confirmed that the number of steps needed for an execution of this method is bounded by a constant.
Suppose, instead, that T is a 2-3 tree, x is an internal node of T whose children are leaves, key is a non-null element of E that is not stored in any of the leaves of T , and the the “Modified Tree” properties would still be satisfied if x was given another child, namely, a new leaf storing the key.
Suppose that, if called with the above key and internal node x as inputs, the method addLeaf adds another leaf, storing the input key, as another child of x — so that T still satisfies the “Modified Tree” properties, the subset of E represented has been changed by including key (and making no other changes), and either x has four children or T is a 2-3 tree.
Once again, an ArrayList — of elements of E, to be stored at the children of x — can be useful as part of an implementation this method, and this method can also be implemented so that the number of steps used by an execution of the method is at most a constant.
Suppose, now, that the insertIntoSubtree method is as shown in Figure 5 on page 11. It is assumed that x.firstChild, x.secondChild and x.thirdChild are the first, second and third children of x, respectively, and that x.firstMax, x.secondMax and x.thirdMax are the largest elements of E stored that the leaves of the subtrees of T with these children of x as roots (when these children exist).
10
void insertIntoSubtree(E key, node x) {
if (x is a leaf) {
Set e to the be the element of E stored at x
if (e is equal to the input key) {
Throw an ElementFoundException } else {
Throw a NoSuchElementException
}
6, } else {
try {
if (x has two children) {
if (key ≤ x.firstMax) {
insertIntoSubtree(key, x.firstChild) } else {
insertIntoSubtree(key, x.secondChild)
}
} else {
if (key ≤ x.firstMax) }
insertIntoSubtree(key, x.firstChild)
} else if (key ≤ x.secondMax) {
insertIntoSubtree(key, x.secondChild) } else {
insertIntoSubtree(key, x.thirdChild)
}
};
raiseSurplus(x);
} catch (NoSuchElementException ex) {
addLeaf(key, x)
}
}
}
Figure 5: The insertIntoSubtree Method
10. Suppose that the precondition for the “Insertion into Subtree of This 2-3 Tree” problem
11
is satisfied and the insertIntoSubtree method is executed with the input key and a node x that is a leaf in T . Tracing through the execution of the pseudocode shown in Figure 5, as needed, explain why this method would terminate with the postcondition for the “Insertion into Subtree of This 2-3 Tree” problem satisfied.
Suppose, again, that the precondition for the “Insertion into Subtree of This 2-3 Tree” problem is satisfied and the insertIntoSubtree method is executed with the key and with an internal node x whose children are leaves, as inputs. Tracing through the execution of the pseudocode shown in Figure 5, as needed, explain why this method would terminate with the postcondition for the “Insertion into Subtree of This 2-3 Tree” problem satisfied.
Note: It might be helpful to compare the algorithms shown in Figures 3 and 5, and make use of similarities in their structure and analysis.
Describe how it can be proved that the insertIntoSubtree algorithm correctly solves the “Insertion into Subtree of This 2-3 Tree” problem. Your answer does not need to be a complete proof. However, you should identify the proof technique that is used, describe what you are inducting on and give both an “Inductive Hypothesis” and “Inductive Claim” whenever induction is to be used, and briefly describe any cases that must be handled (including, but not limited to, situations considered when solving the above problems).
Briefly describe how to prove that the number of steps executed by this method (when it starts with its problem’s precondition satisfied) is in O(depth(x)) where x is the node given as input — and where the Uniform Cost criterion is used to define this number of steps.
It is not necessary to give a complete proof. However, a bound function for this method should be identified. and you should explain how the upper bound for the number of steps, given above, is established.
Once again, an examination of the insert method shown in Figure 4 should confirm that this correctly solves the “Insertion into This 2-3 Tree” problem. Furthermore, the number of steps used by an execution of this method is at most linear in the depth of this tree T .
2-3 Trees: Deletions
Now consider the problem of deleting a value from the subset of E represented by a 2-3 tree.
This problem can be defined as follows.
12
Deletion from This 2-3 Tree
Precondition:
This 2-3 tree, T , satisfies the 2-3 Tree properties given above.
key is a non-null input of type E.
Postcondition:
(a) If the input key belongs to the subset of E represented by T then
the key is removed from this set, which is otherwise unchanged. A
NoSuchElementException is thrown and T is not changed if the key does not belong to this subset of E.
T satisfies the 2-3 Tree properties given above.
Describe related problems to be solved, algorithms that solve these problems, and sketches of proofs of correctness, as needed to describe a solution for this computa-tional problem too.
It will probably be helpful to consider the development of the solution for the “Insertion into This 2-3 Tree” problem — that you have completed if you have solved the problems before this — and describe significant differences between these two problems and their solutions.
In both case some sort of “problem node” has probably been identified and then moved from the area near a leaf up to the root of the tree. Say what kind of “problem node” is involved for the “Deletion” problem instead of the “Insertion” one.
You may find that lots of the related problems are similar to the ones needed for “Insertion” problem. Make sure that you clearly identify which problems are sim-ilar, as you go, and describe the differences in enough detail so that a reader can understand this.
One difference might be that grandchildren of a node should be considered, at a place where the children a node got considered by the corresponding part of the “Insertion” algorithm.
Please allow lots of time to answer this question! It will probably required more work (and a different kind of activity) than the questions before it.
13
Implementation as a Java Program
An incomplete Java program, TwoThreeTree.java, is now available. Details of the implementation of a node for a 2-3 tree are consistent with the pseudocode given in this assignment.
Please complete this, by supplying code for the methods whose content currently con-sists of the phrase
FOR YOU TO COMPLETE
(or something very similar, if there are typographical errors causing similar comments to be used too).
Do not change the code that has already been supplied: This may cause tests used to evaluate your submission to fail. A complete solution has been prepared and tested, so it has already been confirmed that no such changes are needed: If you feel that a change is needed then this is probably evidence that you do not understand the problem to be solved, or that you need to change the code you are including.
However, you will need to introduce additional methods. It might be helpful to add one or two more methods to reduce repetitiveness in the code that you would otherwise need to provide to complete an implementation of an insert algorithm. You should also include methods solving the additional computational problems you discussed when answering the previous question, in order to complete the implementation of a delete algorithm.
Additional code for testing — with information about how to use it — will be made avail-able online very soon (and, in plenty of time for you to use it).
14