Project (JAVA) Solution

Starting from:

~~$29.99~~

$23.99

Home

Do writings by individual authors have statistical signatures? They certainly do, and while such signatures may say little about the quality of an author's art, they can say something about literary styles of an era, and can even help clarify historical controversies about authorship. Statistical studies, for example, have shown that the Illiad and the Odyssey were not written by a single individual.

Project Objectives

Use inheritance to specialize the functionality provided by existing code.

Write statements that process lines of text from a file.

Use arrays to record observations about a data set.

Write a class that conforms to an existing specification.

For this assignment you are to create a program that analyzes samples of text -- novels perhaps, or newspaper articles -- and produces two statistics about these texts: word size frequency, and average word length.

The program consists of three classes: FileAccessor, WordPercentagesDriver and WordPercentages. For this project you will write the WordPercentages class, which must compile and work with the FileAccessor and WordPercentagesDriver classes provided. TheFileAccessor class provides basic file I/O functionality. The driver class reads in the name of a file that contains the text to be analyzed, creates an instance of WordPercentages, obtains the statistics and prints them to the console.

Note: your numbers may differ by +-.01
You can obtain interesting sample texts by, for example, visiting the Gutenberg foundation website (Gutenberg.org), and downloading books from there.

Your job, then, is to code a solution to this problem, and provide these two statistics - word size percentage, for word lengths from 1 to 15 and greater, and average word length.

The source code files provided are WordPercentagesDriver.java and FileAccessor.java.

Notice that the output formatting is NOT produced by the WordPercentages code. It is done by the printWordSizePercentages method in the driver class.

Project Requirements:

Your WordPercentages class must extend the FileAccessor class to read the lines of the text file. Points will be deducted if you do not do this properly.

Your WordPercentages class must have a constructor that takes the file name as a parameter.

You must define and implement the getWordPercentages method in your WordPercentages class which takes no parameters and returns an array of type double. This array contains 16 cells. The index of each cell is the length of the word, the value of the cell is the percentage of all words in the text that have that length. For example, if the cell at index 5 has the value 12.958167330677291, then approximately 13% of all words in the text had a length of 5. NOTE: The output is formatted in the printWordSizePercentagesmethod to a precision of 2 decimal places. The values in your array are not formatted. Note that the cell at index 0 will not be used, since there are no words of length 0. The cell at index 15 will have the percentage of words of length 15 and greater.

You must define and implement the getAvgWordLength method in your WordPercentages class which takes no parameters and returns a single double value which is the average word length that was observed in the text.
You MUST use the String class method split with ONLY these delimiters to tokenize each line of text into words: split("[,.;:?!() ]") and no other filtering.

You must NOT include words of zero length in your calculations.

You must use an array to store the frequencies of word lengths.

Tips/Comments

Note that the split method mentioned above will not produce only pure words. You can use your debugger to investigate the words that are tokenized from each line of text. It is difficult to write a parser that is 100% correct for all documents. This does not matter as we assume that an author's word choice will be discovered by our (imperfect) parsing given large samples of text- the errors will be small compared to the real words counted.

In order to calculate the percentage of a certain word length, you need to keep track of the number of times words of that length occur in the text. Use an integer array to keep a count for each word length, using the word length as the index. Remember that lengths of 15 and greater are grouped together. When all words have been checked, you can compute their percentages as the frequency of a word length divided by the total word count of the text multiplied by 100. These calculations are stored in a doublearray.

The average word length can be calculated as the sum of the products of the individual word frequencies (for each word length) and their word length divided by the total word count of the text.
If you use an integer array to store the frequencies (counts) of words, the index is the length of the words. So, your calculation would be:

average word length = [ sum(frequency * index)] / total word count

The above is a general formula. Remember that in Java you need to use the proper data types to do the calculation properly.

Place your WordPercentages class in the box below.

import java.io.*;