Starting from:
$35

$29

CSC 4780/6780 Homework 2


    • Purpose


One of the first things you will do when exploring a new dataset is make some graphs that will give you some intuitive feel for what the data contains.

Also, the last thing you typically do on a project is make the data visualizations that will help your clients understand and believe your analysis.

You will also calculate a gradient. We will be using gradient descent a lot. The reasons will make a lot more sense if you understand gradients.


    • Study


Read pages 133 - 203 in Practical Data Science with Python.


    • Make plots with Matplotlib


9 points Create a python program called make plots.py that does the following:



    • read bikes.csv and DOX.csv into pandas dataframes

1

    • divide a matplotlib figure into 3 rows and 2 columns of subplots

    • make a pie chart of the statuses of the bikes

    • make a histogram of the prices of the bikes

    • make a scatter plot of the price vs. the weight of each bike

    • make a time-series plot of the price of the DOX stock price

    • make a box plot showing the range of prices for each brand

    • make a violin plot showing the range of prices for each brand

    • save the entire figure into a file calledplots.png


Matplotlib has lots of options, and an important goal of this is to get you to explore some of those options. Try to make plots.png look like this:












































2
















































Your score will be based upon how well your code generates a plot that matches this.

Your code should assume nothing about the data except the names of the columns. That is, don’t hard code any other assumptions about the data in your program.

We have not covered linear regression yet, so here is some code you can use. Assuming that you have loaded the bikes.csv file into a pandas dataframe calleddf, this code will give you the slope and y-intercept of the red line in the scatter plot:


from sklearn.linear_model import LinearRegression

df = ...


3

# Get data as numpy arrays

X =    df[’purchase_price’].values.reshape(-1, 1)

y = df[’weight’].values.reshape(-1, 1)

    • Do linear regression

reg = LinearRegression() reg.fit(X, y)

    • Get the parameters slope = reg.coef_[0] intercept = reg.intercept_

print(f"Slope: {slope}, Intercept: {intercept}")


Do this work by yourself. Stackoverflow is OK. A hint from another student is OK. Looking at another student’s code is not OK.

My solution is less than 80 lines of code.


    • Derive a gradient


1 point

Let f : R3 → R be given by

f(x, y, z) = y sin(5x) + eyz + ln z

What is the gradient?

Answer: ∇f(x, y, z) = 5y cos (5x),    zeyz + sin (5x),    yeyz + z1

(Feel free to use sympy if your calculus is a little rusty. )

Add the solution here in the LaTeX document and build a pdf from it.


    • What to turn in


If your name is Fred Jones, you will turn in a zip file calledHW02 Jones Fred.zip of a directory called HW02 Jones Fred. It will contain:


    • bikes.csv

    • DOX.csv


4

    • make plots.py

    • plots.png

    • Assignment.pdf


Be sure to format your python code with black before you submit it.

We will unzip the directory and run your code like this:


cd HW02_Jones_Fred

python3 make_plots.py


    • Criteria for success


And then we will look at the generated plots.png. If your code doesn’t run, you will lose points.

If plots.png doesn’t look basically like target.png, you will lose points.

For the gradient, the vector should be the correct length. Each component should be correct.


    • Extra help


Here is a good video tutorial on Matplotlib: https://youtu.be/UO98lJQ3QGI

Want to get ahead? Web scraping is next: https://youtu.be/tb8gHvYlCFs




























5

More products