$24
Question 1. (1 marks) In this question, you must use the insertion and deletion algorithms as described in the \Balanced Search Trees: AVL trees" handout posted on the course web site.
Insert keys 0, 25, 19, 5, -2, 28, 13, -5, 2, 6, 14, 7 (in this order) into an initially empty AVL tree, and show the resulting AVL tree T , including the balance factor of each node (only the nal tree should be shown).
Delete key 2 from the above AVL tree T , and show the resulting AVL tree, including the balance factor of each node.
In each of the above questions, only the nal tree should be shown: intermediate trees will be disregarded, and not given partial credit.
Question 2. (1 marks) You are tasked with designing a data structure that maintains a set of books for an online bookstore. A book is de ned as a 3-tuple (identif ier; price; rating), where
identif ier is a unique positive integer which identi es the book.
price is a positive real number which represents the price of the book in dollars. rating is a real number in the range [0; 5], which represents how good the book is.
Note that you can not assume that prices and ratings are unique: there may be several books which have the same price, and there may be several books which have the same rating.
Name a standard data structure D that can store the set of books such that each of the following operations can be done in O(log n) time in the worst-case (where n is the number of books in D):
AddBook(D; x): Insert x which is a new book of the form (identif ier; price; rating) into D.
SearchBook(D; id): Return the price and rating of the book in D whose identif ier is id. If there is no book in D whose identif ier is id, return nil.
Brie y describe what each node u of your data structure contains, and what is the key that you are using when adding a new book. Which standard operations of your data structure you are using to implement AddBook and SearchBook?
In the Parts b, c, d, e below:
For each tree that you de ne, describe what each node of your tree contains, and what is the key that you are using when inserting a new item into this tree.
For each operation, you should give a clear and concise English description of the algorithm implementing the operation.
Also give brief explanations of why each of your algorithms achieve the worst-case time complexity speci ed in the subquestion (where n is the number of books in the data structure).
Modify D to support the following operation, in addition to the operations of Parts a:
BestBookRating(D; p): Let r be the maximum rating among all books in D whose price is at most p.
Then return r. If no book has price at most p, then return 1.
In addition to the English description of the algorithm implementing the above operation, you must also give the corresponding pseudocode. Also describe in English any changes you make to the implementation of the previously de ned operations. The worst-case time complexity of each operation should be O(log n).
Hint: Use an augmented AVL tree, in addition to the data structure used to implement Part a.
Modify D to support the following operation, in addition to the operations of Parts a, b:
AllBestBooks(D; p): Let r be the maximum rating among all books in D whose price is at most p. Then return a pointer to a list of all books in D whose rating is r. If no book has price at most p, then return nil.
Also describe in English any changes you make to the implementation of the previously de ned operations (don’t use pseudo-code). The worst-case time complexity of each operation should be O(log n).
Hint: Use another AVL tree, in addition to the data structures used to implement Parts a, b.
Modify D to support the following operation, in addition to the operations of Parts a, b, c:
IncreasePrice(D; p): Increase the price of every book in D by p dollars. Assume that p is a positive real number.
Also describe in English any changes you make to the implementation of the previously de ned operations (don’t use pseudo-code). The worst-case time complexity of the IncreasePrice operation should be O(1). The worst-case time complexity of the previously de ned operations should remain O(log n).
Modify D to support the following operation, in addition to the operations of Parts a, b, c, d:
DeleteBook(D; id): Remove from D the book whose identif ier is id.
Also describe in English any changes you make to the implementation of the previously de ned opera-tions (don’t use pseudo-code). The worst-case time complexity of the DeleteBook operation should be O(log n), where n is the number of books in D. The worst-case time complexity of the previously de ned operations should remain unchanged.
Question 3. (1 marks)
Give an e cient algorithm which, given two sets A and B, outputs every element of A that is not in B. More precisely, the algorithm’s input are two sets A and B given as unsorted arrays A[1::n] and B[1::n], each containing n distinct integers. The algorithm must print the elements of the set
B = fe j e 2 A and e 2= Bg.
The expected time complexity of your algorithm should be O(n), under some assumptions that we discussed in class.
Hint: What data structures or algorithms that we learned in class involve reasoning about probability?
Note: Do not assume that the numbers in the array are small, or they have a small range, e.g., they could be arbitrary integers. So a solution based on a direct access table or on linear time sorting is not acceptable.
Describe your algorithm clearly and concisely in English, and then give the pseudo-code.
Explain why the expected running time of your algorithm is O(n). Explicitly state every assumption that you need for your complexity analysis.
What is the worst-case running time of your algorithm? Use the notation. Justify your answer.
[The questions below will not be corrected/graded. They are given here as interesting problems that use material that you learned in class.]
Question 4. (0 marks) Consider an abstract data type that consists of a set S of integers on which the following operations can be performed:
Add(i): Adds the integer i to S. If this integer already is in S, then S does not change
Average(t): Returns the average of all elements of S that are less than or equal to the integer t. If all the elements of S are greater than t, then return 0.
Describe how to implement this abstract data type using an augmented AVL tree T . Each operation should run in O(log n) worst-case time, where n = jSj. Since this implementation is based on a data structure and algorithms described in class and in the AVL handout, you should focus on describing the extensions and modi cations needed here.
Give a precise and full description of your data structure. In particular, specify what data is associated with each node, specify what the key is at each node, and specify what the auxiliary information is at each node. In particular, what is (are) the augmented eld(s), and what identity should this (these) elds satisfy. Illustrate this data structure by giving an example of it (with a small set of your own choice). b. Describe the algorithm that implements each one of the two operations above, and explain why each one takes O(log n) time in the worst-case. Your description and explanation should be in clear and concise English. For the operation Average, you should also give the algorithm’s high-level pseudocode.
Question 5. (0 marks)
The task in this question is to compute the medians of all pre xes of an array. As input we are given the array A[1 : : n] of distinct integers. Using a heap data structure, design an algorithm that outputs another array M[1 : : n], so that M[i] is equal to the median of the numbers in the subarray A[1 : : i]. Recall that when i is odd, the median of A[1 : : i] is the element of rank (i+1)=2 in the subarray, and when i is even, the median is the average of the elements with ranks i=2 and i=2 + 1. Your algorithm should run in worst-case time O(n log n).
Describe your algorithm in clear and concise English, and also provide the corresponding pseudocode. Argue that your algorithm is correct.
Justify why your algorithm runs in time O(n log n).
Question 6. (0 marks)
Dynamic storage allocation is a resource management task that is required at several levels of computer systems: for example, an operating system must allocate chunks of main memory to processes, and a le system must allocate chunks of disk space to les. We can describe the problem of storage management in general terms as follows. There is a store S of N equal-size cells, indexed as S[i] for i 2 f1; 2; : : : ; Ng. In practice, the cells of the store are individually addressable units of some type of memory. For example, they may be bytes, words or pages of main memory; or they may be blocks or sectors of disk space.
The storage manager maintains at all times a pool of available fragments. Intuitively, a fragment is a maximal sequence of consecutive cells that are available to be allocated to a process that needs them. When a process requests a fragment of size ‘, the storage manager determines if the pool of available fragments contains a fragment of size k ‘. If it does not, the request is rejected. If the pool contains a fragment of size k ‘, the storage manager removes it from the pool, splits it into one fragment of size
‘ and one fragment of size k ‘, allocates the former to the requesting process, and returns the latter (if k ‘ 6= 0) to the pool. When a process is done with a fragment that it was previously allocated, it returns
4
that fragment to the storage manager. The storage manager returns the fragment to the storage pool after coalescing it with its adjacent fragments, if such fragments are presently in the pool.
More precisely, a fragment is a pair (a; k), where a; k are positive integers such that a + k 1 N; a is the fragment’s start address (the index of the rst cell in the fragment) and k is its size (the number of cells in the fragment). Thus, fragment (a; k) corresponds to the subarray S[a::a + k 1] of the store S. Two fragments F = (a; k) and F 0 = (a0 ; k0 ) are adjacent if the start address of one is immediately after the last cell of the other (i.e., a = a0 + k0 or a0 = a + k); F and F 0 are overlapping if some cell belongs to both of them (i.e., the intervals [a; a + k 1] and [a0 ; a0 + k0 1] are not disjoint). The pool of available fragments is a set of fragments that does not contain any adjacent or overlapping fragments. Initially, the pool of available fragments consists of a single fragment, namely (1; N).
We can formulate storage management as an abstract data type. The object manipulated by this ADT is the pool of available fragments, and there are two operations:
Allocate(P; ‘): Intuitively, this is a request to allocate a fragment of size ‘.
P is a pool of available fragments, and ‘ is an integer such that 1 ‘ N. If P contains no fragment
whose size is at least ‘, the operation returns \reject". If P contains some fragment, say (a; k), such that k ‘, the operation returns (a; ‘), and adjusts P by replacing (a; k) with (a + ‘; k ‘), if k ‘
| or replacing it with nothing, if k = ‘.
Release(P; (a; ‘)): Intuitively, this returns a previously allocated fragment to the pool.
P is a pool of available fragments, and (a; ‘) is a fragment that does not overlap any fragment in P . The operation adjusts P by returning the fragment (a; ‘) to it, after coalescing it with any adjacent fragments.
When the operation Allocate(P; ‘) is invoked and the pool P contains several fragments of size ‘ or more, there are various strategies that the storage manager could use to decide which fragment to use. Following are three common strategies:
First- t: choose the fragment of adequate size that has the smallest possible start address.
Best- t: choose a smallest fragment of adequate size.
Worst- t: choose a largest fragment of adequate size.
In this question, you must describe how to implement the storage management ADT described above using an augmented AVL tree (which represents the pool P of currently available fragments). Your imple-mentation should have the property that the Allocate operation uses the rst- t strategy. Further-more, each operation Allocate and Release should take O(log n) time in the worst-case, where n is the number of fragments currently in the pool P .
Since this implementation of P is based on a data structure and algorithms described in class and in the AVL handout, you should focus on describing the extensions and modi cations needed here.
Give a precise and full description of your data structure. In particular, specify what data is associated with each node, specify what the key is at each node, and specify what the auxiliary information is at each node. Illustrate this data structure by giving an example of it (on a small pool P of available fragments of your own choice).
Describe the algorithms for Allocate and Release, and explain why each one takes O(log n) time in the worst-case.
5
Question 7. (0 marks) You are given a list L of n, not necessarily distinct, integers. Your task is to devise an algorithm that outputs a list of the distinct integers in L, in order of non-increasing frequency (the frequency of an element in L is the number of times it occurs in L). For example, if L is 2; 5; 4; 4; 2; 3; 4; 3, then a suitable output is 4; 3; 2; 5; the only other suitable output for this list is 4; 2; 3; 5. In this example, L contains only small integers, but this is not always the case. You cannot make any assumption on the integers in L (e.g., their maximum size, or distribution). In particular, some integers of L can very large, and so it is not feasible to use tables of size proportional to these integers.
Give a Hash Table-based algorithm for this problem whose expected time complexity is O(n), under some Hash Table assumptions that you must clearly state. Explain why, your algorithm achieves this time complexity under these assumptions.
What is the worst-case time complexity of your algorithm? Use the notation and justify your answer. Hint: As part of your algorithm, you may nd useful to discover a way to sort n numbers in the range 0 to n in O(n) worst-case time.
Give an algorithm for this problem whose worst-case time complexity is O(n log n). Explain why your algorithm achieves this time complexity.
Question 8. (0 marks)
In this question you will explore di erent algorithms to perform the join of two relations, a very important operation in relational databases. We consider a simpli ed setting where we wish to compute the join of relation R with attributes A and C, and relation S with attributes B and C. Notice that the two relations have a common attribute, C. Each relation is a set of pairs. For each pair (a; c) of R, a is a value of attribute A and c is a value of attribute C. Similarly, for each pair (b; c) of S, b is a value of attribute B and c is a value of attribute C. It may be useful to think of R as a two-dimensional array, with two columns, the rst corresponding to A and the other corresponding to C: each row of the two-dimensional array corresponds to a pair (a; c) of R; so the number of rows of the array is equal to the number of pairs in R. Similar comments apply to S.
The join of R and S, denoted R ./ S, is a relation with attributes A, B and C (i.e., the union of the attributes of R and S). It consists of the set of all triples (a; b; c) such that (a; c) is a pair in R and (b; c) is a pair in S. Notice that the two pairs that are being joined to produce the triple (a; b; c) of R ./ S must have the same value in the common attribute C.
Another way of thinking about R ./ S is as follows: We consider every possible combination of a pair (a; c) from R and a pair (b; c0 ) from S. If c = c0 | i.e., if the two pairs have the same value in the common attribute | then this combination contributes the triple (a; b; c) to the join. If c 6= c0 then this combination contributes nothing to the join.
For example, suppose that R is the relation Teaches with attributes ProfName and CourseId; and S is the relation Takes with attributes StudName and CourseId. A pair (p; c) is in Teaches if and only if professor p teaches course c. A pair (s; c) is in Takes if and only if student s takes course c. The join Teaches ./ Takes is the relation with attributes ProfName, StudName, CourseId; it contains a triple (p; s; c) if and only if professor p teaches student s in course c.
The obvious way to compute R ./ S is to consider each combination of pairs from R and S and create a triple from each such combination where the two pairs have the same value in the common attribute. Obviously this algorithm takes time (mn), where m is the number of pairs in R and n is the number of pairs in S.
In a way, this is the best we can hope for: If all pairs in R and all pairs in S have the same value in the common attribute C, then there will be mn triples in R ./ S, and we can’t create a set of mn triples faster than in (mn) time! In practice, however, it is very unlikely that every pair in R joins with every pair in S; in most cases, the size of R ./ S is much smaller than mn. For instance, in our example of the Teaches and Takes relations, if these happened to describe, respectively, the courses taught by U
6
of T faculty and the courses taken by U of T students in a particular semester, the rst relation would contain about 4,000 pairs (approximately 2,000 professors each teaching 2 courses) and the second relation would contain about 250,000 pairs (approximately 50,000 students, each taking 5 courses). In this case the product mn is approximately one billion, while the actual size of Teaches ./ Takes will be approximately 250,000 | more than three orders of magnitude smaller! (There could be more records in Teaches ./ Takes than in Takes, in case a course is co-taught by several professors, but such instances are rare, so typically the size of Teaches ./ Takes will be only a little bit larger than that of Takes.)
In view of this example, we would like to have an algorithm that computes R ./ S in time (k+f(m; n)), where k is the actual size of R ./ S and f(m; n) is as small a function as possible.
In the following two questions you are asked to devise and describe such algorithms. You should describe each algorithm in clear and precise English, and illustrate the data structure(s) that you use with simple example(s); no pseudocode is necessary (but you can use pseudo-code as an additional explanation). To the extent your algorithm makes use of algorithms or data structures discussed in this course or in its prerequisites, you can simply state what algorithm(s) and data structure(s) you use without describing them in detail.
In the following, assume that each of R and S is given as a two dimensional array, where each column corresponds to an attribute and each row corresponds to a pair of the relation. You can not make any assumption on the size or distribution of values contained in R or S (i.e., the values associated with attributes A, B, or C). In particular, some of these values may be very large (e.g., they may be 128-bit integers), and so it is not feasible to use arrays of size proportional to these values.
Let k be the size of the join R ./ S, and N be the size of the larger of the two given relations R and S, i.e., N = max(n; m).
Describe an algorithm that computes R ./ S in worst-case time O(k + N log N). In other words, the worst-case running time of your algorithm, with respect to all the inputs R and S that have N = max(n; m) and whose join is of size k, is O(k + N log N). Your should describe your algorithm as we explained above (start with a few English sentences explaining the high-level idea of your algorithm).
Explain why the worst-case time complexity of your algorithm is O(k + N log N).
Describe an algorithm that computes R ./ S in expected time (k + N) under some reasonable assumptions that you should state. Your should describe your algorithm as we explained above (start with a few English sentences explaining the high-level idea of your algorithm).
Explain why the expected running time of your algorithm is (k + N).
What is the worst-case time complexity of this algorithm? Justify your answer.
Hint: Note that R or S may have many pairs with the same value c in attribute C. You should think about this case carefully when you formulate your assumptions, and when you analyse the time complexity of your algorithm.
7