Starting from:
$30

$24

Homework Assignment #9 Solution

Question 1: Lake Mendota Ice [100 points]




Can you predict how harsh this Wisconsin winter will be?




The Wisconsin State Climatology O ce keeps a record on the number of days Lake Mendota was covered by ice at http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.html.




Write a program Ice.java with the following command line format:




$java Ice FLAG [arg1 arg2]




Where the two optional arguments are real valued.




Questions 5 is 20 points; other questions 10 points each.




As with any real problems, the data is not as clean or as organized as one would like for machine learning. Curate a clean data set starting from 1855-56 and ending in 2017-18. Let x be the year: for 1855-56, x = 1855; for 2017-18, x = 2017; and so on. Let y be the ice days in that year: for 1855-56, y = 118; for 2017-18, y = 94; and so on. Some years have multiple freeze thaw cycles such as 2001-02, that one should be x = 2001; y = 21. Although we do not ask you to hand in any visualization code or gures, we strongly advise you to plot the data (using any plotting tool inside or outside Java) and see what it is like.



For simplicity, hard code the data set in your program. When FLAG=100, print out the data set. One year per line with the x value rst, a space, then the y value. For example,




$java Ice 100 1855 118




...




2001 21




...




2017 94




















1


n
2. When FLAG=200, print n the number of data points, the sample mean y =


i=1 yi, and the sample
n


q
























P














1


n


2


this homework, keep two
standard deviation n 1
i=1(yi


y) , on three lines. For real values in


P


digits after decimal point. For example (for this example the numbers are made-up):




$java Ice 200




321




123.45




32.10







3. We will perform linear regression with the model




f(x) = 0 + 1x:




We rst de ne the mean squared error as a function of 0; 1:




n

M SE( 0; 1) = n1 X( 0 + 1xi yi)2:




i=1




When FLAG=300, arg1= 0 and arg2= 1. Print the corresponding MSE.




$java Ice 300 300.00 -0.10




331.15




$java Ice 300 0 0




10885.20




$java Ice 300 100.00 0




384.59




$java Ice 300 400 0.1




241661.16




$java Ice 300 200 -0.2




84225.76




We perform gradient descent on MSE. At current parameter ( 0; 1), the gradient is de ned by the vector of partial derivatives



@M SE( 0; 1)

@ 0







@M SE( 0; 1)

@ 1









2
n








Xi
+ 1xi yi)


=
n
( 0
(1)




=1






2
n








Xi
+ 1xi yi)xi:


=
n
( 0
(2)




=1







When FLAG=400, arg1= 0 and arg2= 1. Print the corresponding gradient as two numbers on separate lines.




$java Ice 400 0 0




-205.01




-396046.91




$java Ice 400 100 0




-5.01




-8846.91




$java Ice 400 300 -0.1




7.79




15491.09




$java Ice 400 400 0.1




982.19




1902815.09






$java Ice 400 200 -0.2




-579.41




-1121770.91




Gradient descent starts from initial parameter ( 0(0); 1(0)), and iterates the following updates at time t = 1; 2; : : : ; T :
(t)


(t 1)


@M SE( (t 1)
; (t 1))








0
1






0
=
0








(3)
@ 0






(t)


(t 1)


@M SE( (t 1)
; (t 1))








0
1






1
=
1






:
(4)
@ 1







When FLAG=500, arg1= and arg2=T . Start from initial parameter ( 0(0); 1(0)) = (0; 0). Perform




iterations of gradient descent. Print the following in each iteration on a line, separated by space: t; 0(t); 1(t); M SE( 0(t); 1(t)). For example,



$java Ice 500 1e-7 5




1 0.00 0.04 1082.37




2 0.00 0.05 469.99




3 0.00 0.05 431.74




4 0.00 0.05 429.35




5 0.00 0.05 429.20




$java Ice 500 1e-8 5




1 0.00 0.00 9375.50




2 0.00 0.01 8083.77




3 0.00 0.01 6978.55




4 0.00 0.01 6032.91




5 0.00 0.02 5223.81




$java Ice 500 1e-9 5




1 0.00 0.00 10728.94




2 0.00 0.00 10575.01




3 0.00 0.00 10423.38




4 0.00 0.00 10274.02




5 0.00 0.00 10126.89




$java Ice 500
1e-6 5
1
0.00
0.40 442280.27
2
-0.00 -2.18
18672210.33
3
0.01
14.56 789034169.47
4
-0.05 -94.24 33343056376.02
5
0.32
613.00
1409013738521.32



Note with = 1e 6 gradient descent is diverging.







It is not required for hand in, but try di erent initial parameters, , and much larger T and see how small you can make MSE.




Instead of using gradient descent, we can compute the closed-form solution for the parameters directly. For ordinary least squared in 1D, this is
(xi x)(yi y)



1 =P(xi x)2











^
^










0
= y 1x;






where x = (


^


^
(note the order),
xi)=n and y = ( yi)=n. Required: When FLAG=600, print 0
; 1
and the
corresponding MSE on a single line separated by space.






P
P
^


mean?
It is not required for hand in, but give it some thought: what does a negative 1
^
Use 0; 1 you can predict the number of ice days for a future year. Required: When FLAG=700, arg1=year. Print a single real number which is the predicted ice days for that year. For example,



$java Ice 700 2050 80.75




It is not required for hand in, but give it some thought:




What's the prediction for this winter?




Which year will the predicted ice day become negative? What does that say about the model?




If you are agonizing over your inability to get gradient descent to match the closed-form solution in



question 5, you are not alone. The culprit is the scale of input x compared to the scale of the implicit o set value 1 (think 0 = 0 1). Gradient descent converges slowly when these scales di er greatly, a situation known as bad condition number in optimization. When FLAG=800, we ask you to rst normalize the input x (note: not the labels y):








1




n






















Xi












x
=


n






xi








(5)








=1
















v


































stdx
=




1


n
(xi


x)2
(6)


n


1
i=1




u














u






X












t


















xi


(xi x)=stdx:




(7)



Then proceed exactly as when FLAG=500. The two arguments are again arg1= and arg2=T . For example,




$java Ice 800 1 5




1 205.01 -17.90 10883.24




2 -0.00 -0.22 10881.32







3 205.01 -17.69 10879.45




4 -0.00 -0.43 10877.62




5 205.01 -17.47 10875.84




$java Ice 800
0.1 5
1 20.50
-1.79
7073.86
2 36.90
-3.22
4634.55
3 50.02
-4.37
3073.35
4 60.52
-5.29
2074.16
5 68.91
-6.03
1434.66



$java Ice 800 0.01 5




1 2.05
-0.18 10465.96
2 4.06
-0.35
10063.31
3 6.03
-0.53
9676.61
4 7.96
-0.70
9305.22
5 9.85
-0.86
8948.54



With = 0:1 you should get convergence within 100 iterations.




(Note the 's are now for the normalized version of x, but you can easily translate them back for the original x with a little algebra. This is not required for the homework.)




Now we implement Stochastic Gradient Descent (SGD). With everything the same as part 8 (including



x normalization), we modify the de nition of gradient in equations (1) and (2) as follows. In iteration t we randomly pick one of the n items. Say we picked (xjt ; yjt ). We approximate the gradient using that item only:




@M SE( 0
; 1)


2( 0 + 1xjt
yjt )
(8)
@ 0




@M SE( 0
; 1)


2( 0 + 1xjt
yjt )xjt :
(9)
@ 1







When FLAG=900, print the same information as in part 8. For example (your results will di er because of randomness in the items picked):




$java Ice 900 0.1 5




1 17.80 24.89 8614.29




2 45.98 -12.13 3502.30




3 56.93 -12.36 2385.58




4 67.55 -16.63 1577.17




5 76.76 -17.41 1030.71




With = 0:1 you should approximately converge within a few hundred iterations.




Since n is small in our data set, there is little advantage of SGD over gradient descent. However, on large data sets SGD becomes more desirable.




Hints:




Update 0; 1 simultaneously in an iteration. Don't use a new 0 to calculate 1.



Use double instead of oat in Java.



Don't round the variables themselves to 2 digits in the middle stages. That is for printing only.

More products