Coursework 1 Solution

This coursework requires you to write four MapReduce programs. These programs should be written using Python 3 and the Python mrjob library. Each solution should distribute computation across multiple map and/or reducer tasks.

Part 1

Given a CSV file where each line contains a set of numbers, write a MapReduce program which determines the maximum of all numbers in the file. For example, consider the following sample CSV file:



Given this CSV file, the maximum is 4.

Entitle the python program in question That is, entering the following command at the terminal should result in your MapReduce program being applied to fileName.csv pipenv run python fileName.csv

Part 2

Given a CSV file where each line contains a set of numbers, write a MapReduce program which determines the mean of all numbers in the file. For example, consider the following sample CSV file:



Given this CSV file, the mean is 2.8.

Entitle the python program in question That is, entering the following command at the terminal should result in your MapReduce program being applied to fileName.csv pipenv run python fileName.csv

Part 3

Uniform Resource Locator (URL) links describe the structure of the web. Consider a CSV file

where each line contains two URLs which specify a single link. That is, the first and second

values on each line specify the source and destination of the link in question. For example,

consider the following sample CSV file:






Given such a CSV file, write a MapReduce program which finds all paths of length two in the corresponding URL links. That is, it finds the triples of URLs (u, v, w) such that there is a link from u to v and a link from v to w.

For example, the sample CSV file above contains the following paths of length two:

url2, url4, url5

url1, url2, url3

url1, url2, url4

Entitle the python program in question That is, entering the following command at the terminal should result in your MapReduce program being applied to fileName.csv pipenv run python fileName.csv

Part 4

Write a mapReduce program which takes as input a file containing comma separated words

and outputs for each word the lines that the word appears in. For example, consider the

following file:






The corresponding output will be the following:

"buffalo" ["buffalo,dolphin,cat"]

"cat" ["buffalo,dolphin,cat", "cat,horse", "dog,cat,sheep"]

"chicken" ["goat,chicken,horse"]

"dog" ["dog,cat,sheep"]

"dolphin" ["buffalo,dolphin,cat"]

"goat" ["goat,chicken,horse"]

"horse" ["cat,horse", "goat,chicken,horse"]

"sheep" ["dog,cat,sheep", "sheep"]

Entitle the python program in question That is, entering the following command at the terminal should result in your MapReduce program being applied to fileName.csv pipenv run python fileName.csv

