Starting from:
$35

$29

Trie Articles SOlution




A file, companies.dat, contains a list of company names on each line. Company names may have multiple words in the name. Sometimes, a company might have multiple names associated with it. The first name is the primary name, and the remainder are synonyms. In this case, the company names will be separated by a tab on the same line. (Create a sample version of this file for your testing. The final file used for grading is not published.)




Write a program that can read a news article from standard input. Keep reading until you get a line in the article that consists entirely of a period symbol (.).




Identify each company name in the article, and display each company name on the screen, one line at a time. Always display the primary name of the




company identified, not the synonym you found in the text. On the same line, display the "relevance" of the company name hit. Relevance is defined as




frequency of the company name appearing in the article divided by the number of words in the article." For example, Microsoft in "Microsoft released new




products today." should result in a relevance of 1/5, or 20%. If two names for the same company match, they count as matches for the same one




company. Display the relevance in percentage. You should ignore the following words in the article (but not the company name) when considering relevance:




a, an, the, and, or, but




You must normalize the company names for the search. Punctuation and other symbols should not impact the search. So the appearance of Microsoft Corporation, Inc. in the companies.dat file should match with Microsoft Corporation Inc in the article. However, the search must be case sensitive.




Output:




Company
Hit Count
Relevance
Microsoft
6
4.38889%
Apple Inc.
4
3.08333%
Verizon
2
2.38889%
Wireless






Total
12
10%
Total Words


120











Output should consist of




Each Company Name, Hit Count, and the Relevance (Relevance = HitCount / Total Number of Words).




The second to last row of your output should read Total, Total Hit Count, and Total Relevance.




The last row should simply output the total number of words in the file.







Note: You must not submit your "node_modules" folder if you are working on NodeJs/JavaScript. (Just submit your JavaScript source code and package.json file)







Trie Articles

















Criteria
Ratings
Pts








Input: Prompt user for a news article.
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Input: Read data from file named "company.dat". (No points if either filename is incorrect or used absolute path)
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Calculate: Company's hit count (includes synonym)
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Calculate: Company's Relevance (Must be decimal a value up to 4 digits. Ex: 6.000%)
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Stopwords: Ignore words "a", "an", "the", "and", "or", and "but". (-8 points if these words in company names are ignored)
10.0 pts
0.0 pts
10.0 pts
Full Marks
No Marks














Output: Every line should have Company Name, Hit Count, and the Relevance
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Output: Second last row should have Total, Total Hit Count, and Total Relevance.
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Output: The last row should have the total number of words in the file.
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Data Structure: Implementation of Tries
30.0 pts
0.0 pts
30.0 pts
Full Marks
No Marks














Search: Normalize company name
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Search: No impact of punctuation and other symbols
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Search: Case sensitive
5.0 pts
0.0 pts
5.0 pts
Full Marks
No Marks














Coding Style and Test Cases
10.0 pts
0.0 pts
10.0 pts
Full Marks
No Marks














Note: (a) Late submission penalty per policy (b) 5 points penalty if the output for improper format and indentation.
0.0 pts
0.0 pts
0.0 pts
Full Marks
No Marks

















Total Points: 100.0

























































































More products