$24
Overview
The goals of this assignment are:
Setup the development environment for Hadoop
Understand and use the APIs for HDFS
Compare the performance of HDFS to the local file system
Description
Write a Java program that copies a file from the local machine to HDFS. It should run from the command line and take two command line arguments, a path to a local file, and a path in HDFS. It should copy the file from the local file system to HDFS. If the local file does not exist, it should signal an error. If the file in HDFS already exists, it should report this and fail. If the file in HDFS cannot be created, for any reason, this should also be reported. Test your program on your local machine in the pseudo-distributed Hadoop installation. You should not use the methods FileSystem#copyFromLocal or FileSystem#moveFromLocal in your implementation. Rather, you should use the methods described in class such as FileSystem#open and FileSystem#create.
Use the following three tasks to measure the performance of the file system and compare the performance of the LocalFileSystem to the DistributedFileSystem.
The total time for copying the 2GB file provided in the instructions below.
The total time for reading a 2GB sequentially from the start to the end.
The total time to make 2,000 random accesses, each of size 1KB. To test this, generate a random position in the file, seek to that position, and read 1 KB.
Submission instructions
The assignment is due on Monday, 10/22/2018, at 11:59 PM Pacific Time.
Late submissions are allowed with a 20% penalty for each calendar day up-to four days late.
The Java class should be named HDFSUpload in the package edu.ucr.cs.cs226.<UCR net ID where <UCR net ID is replaced with your ID all in lower-case letters. For example, the Java class could be named ‘edu.ucr.cs.cs226.eldawy.HDFSUpload’
The file can be accessed on the following link: https://drive.google.com/file/d/0B1jY75xGiy7eR3VpNC1XMzB5cWs/view Notice that the file is compressed. Make sure to decompress it first.
Please upload your answer in a single ZIP file named ‘cs226-asg1-<UCR net ID.zip’ where <UCR net ID is replaced with your ID. The ZIP file should contain a directory named ‘HDFSUpload’ which contains the full directory structure as generated by Maven with your implementation. Please remove any binary files and keep only the HDFSUpload.java file and pom.xml file. You can optionally include a README file for compilation instructions and a LICENSE file.
Failing to follow the instructions above might result in losing some points.