$24
Objective: In this assignment, you’ll visualize information from a data table exported by xlsx or csv files. You can also create your own data table. In this assignment, you’ll work with multivariate data. The number of variables in the data table should be more than five. Each student should use their own unique data table for this assignment.
Specification:You’ll visualize the data using Parallel Coordinates Plot as discussed in the class. You should be able to draw relationships among different variables. You should also be able to cluster the observations. The following example (Table 1) shows a data table (education) with the following variables: state, reading, math, writing, percent_graduates_sat, pupil_staff_ratio, and drop_rate.
Table 1: Education Data Table:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
You need to use Parallel Coordinates Plot to find relationships among different variables for different observations. You need to use ‘clustering’ technique to cluster the observations to find a better relationship among different variables among different observations. Figure 2 shows the parallel coordinates plot for Education Data Table.
Figure 2: Parallel Coordinates Plot (with Clustering) for Education Data Table (Table 1).
Next, you need to plot the map of the states that helps you to further analyze a particular variable across the states. Figure 3 shows how reading ( a variable in Education data table) varies across the states. Use map to show the variations of at least three variables across the states.
Figure 3: State map showing the variation of a variable across the states.
To do lists:
Export data using excel sheet or csv files.
You may need to clean the data; adjust rows, columns or order the data table using a particular variable.
Use Parallel Coordinate Plot to plot the data for different states;
Use ‘clustering technique to cluster the data
Use map to visualize the variation of a variable across the states; Use map to visualize the variations of at least three different variables across the states.
Submission:
Each student should have his own unique data table. No two submissions should be the same.
The submission should consist of R codes that you will use for visualizing the data table. You also need to write how your work is able to answer different questions, i.e,
How you will draw relationships among different variables in a particular observation, How you will establish relation among the variables for different observation.
Does clustering help in visualizing information?
You need to write down about the clarity of your presentation.
So, your final submission will be a zip file consisting of the following:
Data Table ( csv file or excel sheet);
A pdf file named ‘R.pdf’ for all R codes used for the assignment;
A pdf file named Assignment 4.pdf that explains your work and describes different steps as found in To do lists.
Name the zipped file as follows: LastName_FirstLetter ofFirstName_Assignment4.zip.
Submission deadline is Friday, December 1.
This assignment carries 15% of the course evaluation.