$24
Restrictions and Requirements
• No global variables may be used
• Your submission must contain at least 2 or more .cpp files and one or more .h files
• Your submission must have at least 3 user defined functions in it in addition to main
Description
Write a program that displays the top N most occuring words in a file along with the number of times the word appeared.
Additional Details
• Words should be displayed from most commonly occuring to least commonly occuring
• A word is considered to be 1 or more consecutive alphanumeric characters
• Case does not matter when counting words
◦ HELLO and hello are to be considered the same word
◦ When displaying the most commonly occuring words they should all be displayed in lowercase
• When counting a word all leading and trailing non-alphabetical, non-numeric characters should removed for a more accurate count
◦ For example
▪ hello
▪ hello,
▪ hello.
▪ hello;
▪ !!$#%hello<>?/
◦ Are all considered to be the same word
◦ The complete list of special characters is: ,.:;"|\!@#$%^&*()_+-=[]{}<>?/~`'
• If multiple words tie for most commonly occurring they should all be displayed
◦ These words should be displayed in alphabetical order
• You should ignore the following words when counting the most common occurring words because they are so frequent and aren't interesting
◦ a, an, and, in, is, it the
• If there are fewer than N unique occurences of a word all words should be displayed
◦ For example if there were 5 unique words in a file but the user asked to display the top 10 words then only the top 5 will be displayed as there are only 5 words in the file
Input
• All input will be valid
Command Line Arguments
• First Argument: The path to the file
◦ Required
• Second Argument: N, the number of top words to find
◦ Will always be an integer greater than or equal to 1
◦ Optional. If not given N should default to 10
Hints
• When opening the file to read from it make sure to use only an ifstream and not an fstream. This is because you only have read permissions on the files on Kodethon and opening a file with an fstream requires both read and write permissions. Since you don’t have write permissions attempting to open a file with an fstream in testing will cause you to fail with weird behavior.
• The algorithm librarycontains many useful functions for helping to solve this problem
• You will find mapsto be incredibly useful in solving this problem
◦ By default a map will sort the values in ascending order. You can change this by providing a comparator function. This exampleshows how to do that.
Examples
• Input has been underlined so that you can differentiate between what is input and what is output
◦ You do not have to underline anything
• Assume that shake_it_off.txt contains the lyrics to Taylor Swift's song "Shake it Off" which can be found here: shake_it_off.txt
• I’ve also provided an example executable named ExampleTopCommonWords that can be run by doing ./ExampleTopCommonWords path_to_file num_words_to_find
◦ It is only guaranteed to run on Kodethon and may not run on your personal computer
Example 1
./TopCommonWords shake_it_off.txt 5
1.) These words appeared 78 times: {shake}
2.) These words appeared 70 times: {i}
3.) These words appeared 44 times: {off}
4.) These words appeared 21 times: {gonna}
5.) These words appeared 15 times: {break, fake, hate, play}