Starting from:
$35

$29

Lab 5 Solution


Instructions

For this lab you will, again, be submitting an HTML markdown document. We will be working with the San Francisco City Salary Data set posted on www.kaggle.com. The data set is posted on our Poly Learn, but you may also just get it straight from Kaggle.

You must use either the lattice package or the ggplot2 package to create the graphs listed below. You may NOT use the Base Plotting System.

Importing the data involves a few special things:

    • Make sure header = TRUE

    • Include na.strings = "Not Provided"


Exercises

1. Create a plot with overlaid histograms of Base Pay for the 3 jobs with the highest average Base Pay.



2.0


1.5




JobTitle
1.0

Chief of Police


Chief, Fire Department





Gen Mgr, Public Trnsp Dept
0.5


0.0


300000
310000
320000
Base Salary ($)



2. Create a plot with side-by-side boxplots of Base Pay by status (FT or PT).





1


3e+05








2e+05


BasePay


1e+05








0e+00


FT    PT

Status

    3. Create a multipanel plot (one panel for each value of Year) of Total Pay versus Base Pay with overlaid linear fits.






























2













TotalPay

2011    2012    2013    2014









4e+05
Year



2014


2013

2012

2e+05

2011






0e+00

0e+001e+052e+053e+050e+001e+052e+053e+050e+001e+052e+053e+050e+001e+052e+053e+05

BasePay

    4. Create a multipanel plot (one panel for each Job) of side-by-side boxplots (one boxplot for each value of Year) of BasePay for the 3 jobs that occur most frequently in the data set.






























3












BasePay

Registered Nurse    Special Nurse    Transit Operator





150000






100000






50000






0


2012    2013    2014    2012    2013    2014    2012    2013    2014

as.factor(Year)











as.factor(Year)


2012

2013

2014


Additional Notes

    • Be sure to add appropriate titles and axes labels to all of your graphs.

    • Make your graphs colorful!

    • Consider adding any other details you think would add to the visual appeal of your graphs without taking away from the data.























4

More products