$29
Purpose of the Project: The purpose of this assignment is to write a program to implement autocomplete for a given set of N strings and nonnegative weights. That is, given a prefix, find all strings in the set that start with the prefix, in descending order of weight.
What is Autocomplete: Autocomplete is an important feature of many modern applications. As the user types, the program predicts the complete query (typically a word or phrase) that the user intends to type. Autocomplete is most effective when there are a limited number of likely queries. For example, the Internet Movie Database uses it to display the names of movies as the user types; search engines use it to display suggestions as the user enters web search queries; cell phones use it to speed up text input.
In these examples, the application predicts how likely it is that the user is typing each query and presents to the user a list of the top-matching queries, in descending order of weight. These weights are determined by historical data, such as box office revenue for movies, frequencies of search queries from other Google users, or the typing history of a cell phone user. For the purposes of this
CS146 Project 3
assignment, you will have access to a set of all possible queries and associated weights (and these queries and weights will not change).
In this assignment, you will implement autocomplete by sorting the queries in lexicographic order; using binary search to find the set of queries that start with a given prefix; and sorting the matching queries in descending order by weight.
Problem1. (Autocomplete Term) Implement an immutable comparable data type Term in Term.java that represents an autocomplete term: a string query and an associated real-valued weight. You must implement the following API, which supports comparing terms by three different orders: lexicographic order by query string (the natural order); in descending order by weight (an alternate order); and lexicographic order by query string but using only the first
r characters (a family of alternate orderings). The last order may seem a bit odd, but you will use it in Problem 3 to find all terms that start with a given prefix (of length r).
Corner cases. The constructor should throw a java.lang.NullPointerException if query is null and a java.lang.IllegalArgumentException if weight is negative. The byPrefixOrder() method should throw a
java.lang.IllegalArgumentException if r is negative.
Performance requirements. The string comparison functions should take time proportional to the number of characters needed to resolve the comparison.
CS146 Project 3
Problem 2. (Binary Search Deluxe) When binary searching a sorted array that contains more than one key equal to the search key, the client may want to know the index of either the first or the last such key. Accordingly, implement a library of static methods BinarySearchDeluxe.java with the following API:
Corner cases. Each static method should throw a java.lang.NullPointerException if any of its arguments is null. You should assume that the argument array is in sorted order (with respect to the supplied comparator).
Performance requirements. The firstIndexOf() and lastIndexOf() methods should make at most 1+[logN] compares in the worst case, where N is the length of the array. In this context, a compare is one call to comparator.compare()
Problem 3. (Autocomplete) In this part, you will implement a data type that provides autocomplete functionality for a given set of string and weights, using Term and BinarySearchDeluxe. To do so, sort the terms in lexicographic order; use binary search to find the set of terms that start with
CS146 Project 3
a given prefix; and sort the matching terms in descending order by weight. Organize your program by creating an immutable data type Autocomplete in Autocomplete.java with the following API:
Corner cases. The constructor and each method should throw a java.lang.NullPointerException if its argument is null.
Performance requirements. The constructor should make proportional to N logN compares (or better) in the worst case, where N is the number of terms. The allMatches() method should make proportional to logN +M logM compares (or better) in the worst case, where M is the number of matching terms. The numberOfMatches()method should make proportional to logN compares (or better) in the worst case. In this context, a compare is one call to any of the compare() or compareTo() methods defined in Term.
CS146 Project 3
Interactive GUI. The program AutocompleteGUI takes the name of a le and an integer k as command-line arguments and provides a GUI for the user to enter queries. It presents the top k matching terms in real time. When the user selects a term, the GUI opens up the results from a Google search for that term in a browser.
Data Under the data directory, we provide sample input les for testing. Each le consists of an integer N followed by N pairs of query strings and nonnegative weights. There is one pair per line, with the weight and string separated by a tab. A weight can be any integer between 0 and 263-1. A query string can be an arbitrary sequence of Unicode characters, including spaces (but not newlines). For example, wiktionary.txt contains the 10,000 most common words in Project Gutenberg, with weights proportional to their frequencies.
Acknowledgements This project is an adaptation of the Autocomplete Me assignment developed by Princeton University.