Starting from:
$30

$24

Assignment 2: MapReduce and Pig Solution

Description




In this assignment, you need to implement the nearest neighbors (KNN) query using MapReduce and Pig. In the KNN query, the input is a set of points ( ) in the Euclidean space, a query point ( ), and an integer ( ). The output is the points in that are closest to the query point . A naïve solution is to compute the distance from each point ∈ to the query point , sort all the points by their distance, and choose the top points. However, this naïve solution might not be directly applicable in MapReduce. Your task is to provide two implementations, one using a MapReduce program in Hadoop, and the other using a plain Pig Latin script.




Submission instructions

The assignment is due on Friday, 11/23/2018, at 11:59 PM Pacific Time.




Late submissions are allowed with a 20% penalty for each calendar day up-to four days late.




The Java class should be named KNN in the package edu.ucr.cs.cs226.<UCR net ID where <UCR net ID is replaced with your ID all in lower-case letters. For example, the Java class could be




named ‘edu.ucr.cs.cs226.eldawy.KNN’




A sample file can be accessed on the following link: https://drive.google.com/open?id=1hEpg_-XecIKwGcTLIShBw8pvEhtCPaEc




Notice that the file is compressed. You can decompress it first if you want. The file contains




three fields separated by comma, ID, x, and y.




Please upload your answer in a single ZIP file named ‘cs226-asg2-<UCR net ID.zip’ where <UCR net ID is replaced with your ID. The ZIP file should contain a directory named ‘KNN’ which contains the full directory structure as generated by Maven with your implementation. Please remove any binary files and keep only the KNN.java file and pom.xml file. You can optionally




include a README file for compilation instructions and a LICENSE file.

Place your Pig Latin script under ‘KNN/src/main/pig/KNN.pig’




Add a PDF file under ‘KNN/report.pdf’ which answers the following questions:




Which InputFormat did you use in the MapReduce program?



What is the input and output format of the map function?



What is the logic of the map function?



If a combiner function is used, what is the signature of the combiner function (input and output) and what is its logic?



If a reduce function is used, what is the signature of the reduce function (input and output) and what is its logic?



How many mappers and reducers are needed for your program?



How many records are shuffled between the mappers and reducers?



For the Pig Latin program, how many MapReduce jobs are needed to run the program?



How does this compare to the MapReduce implementation?

Failing to follow the instructions above might result in losing some points.

More products