$24
The goal of this week’s assignment is:
1. Practice sorting
2. Practice measuring the runtime of a program
3. Practice working with generics
4. Continue learning the significance of special cases!
5. Learning how to write tests to check your implementation!
Things you must do:
1. There are many details in this assignment. Make sure you read the whole thing carefully before writing any code and closely follow this instruction.
2. You must complete your assignment individually.
3. Always remember Java is case sensitive.
4. Your file names, class names, and package name must match exactly as they are specified here.
Things you must not do:
1. You must not change the file names, class names, package names.
2. You must not change the signature of any of these methods (name, parameters, …).
For this lab (and assignment), you are asked to construct a program that has the
capability to determine if two words are anagrams and to find the largest group of
anagrams in a list of words. Two words are anagrams if they contain the same
letters with the same frequency. For example, alert and later are anagrams. There
are many websites devoted to listing anagrams, such as this one
(http://www.english-for-students.com/Complete-List-of-Anagrams.html) if you are
unclear about the concept yet.
We have learned about how sorting algorithms work and how to measure the efficiency of a program by its time complexity. Now that we know how to implement some popular sorting algorithms, we are eager to use sorting to solve these two problems. We will implement two sorting algorithms for the task: insertion sort and merge sort. While we are at it, we will also measure the time our sorting implementations take so that we can predict the performance for much larger inputs, say of size 100K or even 1M!
To check if two words are anagrams, simply sort the characters in each word. If the sorted versions are the same, the words are anagrams of each other. Words with the same letters with the same frequency, but different cases are still anagrams. E.g., Begin and being are anagrams. Hint: Java's built-in String method toLowerCase will come in handy when sorting and determining if words are anagrams, but you must still output them in their original case. We will concern ourselves only with word anagrams and not anagrams that are phrases, which may contain whitespace and punctuation.
Part 0
• Create a new Java project and call it Lab06.
• Create a package inside your project and call it lab06.
• Download the Lab06-Assignment06 ZIP folder from Blackboard (google drive link). Copy and paste the following files into your new lab06 package:
◦ AnagramUtil.java
◦ InsertionSort.java
◦ MergeSort.java
◦ SortedString.java
◦ Tester.java
Part 1 - SortedString implementation
The SortedString class is responsible for storing the sorted version of a String along with the (unsorted) original. We begin with finishing its implementation.
• public SortedString(String unsorted)
◦ This constructor initializes the two member variables unsorted and sorted in the class based on the input string.
◦ The member variable sorted can be determined by converting the input parameter unsorted to a char array using toLowerCase().toCharArray() and then converting back to a string after sorting. Remember that we do not care about upper/lower case for the sorted string.
◦ To sort the characters in this array, we will use Arrays.sort(chars)
(don’t worry, in the next steps we will implement and use our own :-)
◦ Be sure to set both member variables unsorted and sorted
• public int compareTo(SortedString other)
◦ This method compares two SortedStrings based on the sorted string.
◦ Note that this method will be used later for making comparisons in our sorting algorithms.
• Before moving to Part 2, test your SortedString implementation in Tester.java. You should check:
◦ Instantiating a SortedString object from a string, e.g. “zebra”
• Checking the sorted and unsorted versions. Is the sorted version sorted correctly? Does the unsorted version contain the original string? Print the strings and make sure they are what you expect.
• Comparing two SortedStrings. For instance, “joy” is bigger than “ski”. Why? What about “below” and “Elbow”?
Part 2 - AnagramUtil implementation: phase 1
• public static boolean areAnagrams( SortedString str1, SortedString str2)
◦ This method returns true if the two input SortedStrings are anagrams of each other, otherwise returns false.
◦ Remember: to check if two words are anagrams, simply compare the sorted versions of each word. If the sorted versions are the same, the words are anagrams of each other.
• Before moving to Part 3, test your AnagramUtil implementation in Tester.java. For instance, if you pass in a string, say your first name, and your first name again but with the characters reversed, you should get true! But, if you pass in your first name and your last name, you should get false!
• You should also check:
◦ empty string vs another string
◦ exactly same string
◦ shuffled strings
◦ two different strings
◦ make sure the answers are case insensitive
Part 3 - InsertionSort implementation
Now we will implement our first sorting algorithm using insertion sort.
• public E[] sort(E[] array)
◦ This method takes a generic input array and returns a sorted array using an insertion sort. Since this method is generic, we cannot use > or < for
comparing values. Instead, we must make use of Comparable, a la compareTo().
▪ Note: If InsertionSort is instantiated with type SortedString, i.e., InsertionSort<SortedString>, the sort() method will invoke the compareTo() we just wrote in part 1. Do you see why?
◦ This method should not modify the input array. Instead it should make a copy of the array using array.clone(), and sort the clone.
◦ Take care of special cases! What if the array is of size 1?
• Before moving to Part 4, test your InsertionSort implementation in Tester.java. You should check the following with InsertionSort<Integer> :
○ a list with one element ○ a list with two elements ○ a sorted list of numbers ○ a random list of numbers
• For the task at hand, we are more interested in an InsertionSort that handles types of SortedString, i.e. InsertionSort<SortedString>. To test this, you may find it helpful to use the convenience function toSortedString from SortedString. For instance:
◦ if myArray is a String array, the following will return a SortedString[] representation: SortedString.toSortedString(myArray)
Handy!
• Now, if we have InsertionSort<SortedString> and we pass the following array of strings to its sort function: {“joy”, “ski”, “fed”, cat”}. Here is how the output should look like {“cat”, “fed”, “ski”, “joy”}. Make sure you understand this example.
Part 4 - Predicting runtime for InsertionSort
For the last part of lab, we will measure the performance of InsertionSort using the time formula you learned in class. This formula is quite useful as it allows us to make predictions about how our function will do for much larger inputs, without actually running it.
● public double O(int n)
◦ This method returns the order O() of the implementation.
◦ Consult the lecture slides if you have doubts
◦ Note: for a runtime of O(n2), it is better to use Math.pow(n, 2) rather than n * n.
• public void fit(E[] array)
◦ This method calculates the constant c using the given input array. Time measurements are measured in microseconds.
◦ Remember the formula has form time = c * O(), where time is the time it takes to run the insertion sort for the input array and O() is the big-Oh given from the function above.
◦ To test the insertion sort algorithm, make a call to sort().
◦ To time your program, you can use System.nanoTime(). Remember to convert nanoseconds to microseconds!
• public double predict(int n)
◦ This method predicts the running time of an insertion sort for an input size n. The estimated time return is in unit microseconds.
◦ Note: this method will NOT run the insertion sort with the given n. Instead, you must use the calculated constant c from fit() to make a prediction for the n. You may assume that fit() has been called previously before predict().
To test your timing functions, try running fit() with a modestly sized array, say with 10 elements. Then try running predict() for an input size much larger, say 100K or even 1M. How much time would it take to sort such a large list? Would it finish by the end of lab? Think about whether your answers make sense given what you know about insertion sort. If you have doubts, consult your lab instructors.
Part 5 - Testing
Unlike previous labs/assignments you are not given any tests as a starting point. You must create your own tests to examine all and every method you implemented in the previous part. Several testing suggestions were made throughout the lab. Make sure you check the functionality of your methods very carefully.
Don’t forget: The lab/assignment is due
Tuesday night @ 11:59pm!