$24
Starter Code. Starter code written in Python is provided for Question 2.
[3pts] Robust Regression. One problem with linear regression using squared error loss is that it can be sensitive to outliers. Another loss function we could use is the Huber loss, parameterized by a hyperparameter :
L (y; t) = H (y t)
) if
ja
j
( ( a
1
1
a2
if
a
H
(a) =
2
j j
j
j
2
(a) [1pt] Sketch the Huber loss L (y; t) and squared error loss LSE(y; t) = 12 (y t)2 for t = 0, either by hand or using a plotting library. Based on your sketch, why would you expect the Huber loss to be more robust to outliers?
[1pt] Just as with linear regression, assume a linear model: y = wx + b:
Give formulas for the partial derivatives @L =@w and @L =@b. (We recommend you nd a formula for the derivative H0 (a), and then give your answers in terms of H0 (y t).)
[1pt] Write a Python function gradient_descent(X, y, lr, num_iter, delta) which takes as input:
http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html
1
CSC411 Homework 3
a training set X: given as a design matrix, an ndarray of shape (N; D)
a target vector y: an ndarray of shape (N; )
scalar lr: learning rate value
num_iter: integer value specifying number of iterations to run gradient descent
delta: hyperparameter for the Huber loss
and returns the value of the parameters w and b after performing (full batch mode) gradient descent to minimize this model’s cost function for num_iter iterations. Initialize w and b to all zeros inside the function. Your code should be vectorized, i.e. you should not have a for loop over training examples or input dimensions. You may nd the function np.where helpful.
Submit your code as q1.py.
[6pts] Locally Weighted Regression.
[2pts] Given f(x(1); y(1)); ::; (x(N); y(N))g and positive weights a(1); :::; a(N) show that the solution to the weighted least squares problem
w = arg min
1
N
a(i)(y(i)
wT x(i))2 +
w 2
(1)
Xi
2
2 jj
jj
=1
is given by the formula
1 XT Ay
w =
XTAX + I
(2)
and A is a diagonal matrix where A
=
where X is the design matrix (de ned in class)
ii
a(i).
[2pts] Locally reweighted least squares combines ideas from k-NN and linear regression. For each new test example x we compute distance-based weights for each training ex-
(i)
exp(
x x(i)jj2=2 2)
, computes w = arg min
1
N
(i)
(i)
T
(i)
2
ample a
=
i=1 a
(y
w
x
)
+
P
(j)
2
2
2
2 jjwjj
and
x x
jj
)
P
j exp(
=2
2
predicts y^ = xT w . Complete the implementation of locally reweighted least
squares by providing the missing parts for q2.py.
Important things to notice while implementing: First, do not invert any matrix, use
a linear solver (numpy.linalg.solve is one example).
Second, notice that
exp(Ai)
=
Pj
exp(Aj )
exp(Ai B)
exp(Ai)
but if we use B = maxj Aj it is much more numerically stable as
Pj exp(Aj B)
P
j exp(Aj )
over ows/under ows easily. This is handled automatically in the scipy package with the scipy.misc.logsumexp function 3.
[1pt] Randomly hold out 30% of the dataset as a validation set. Compute the average loss for di erent values of in the range [10,1000] on both the training set and the validation set. Plot the training and validation losses as a function of (using a log scale for ).
[1pt] How would you expect this algorithm to behave as ! 1? When ! 0? Is this what actually happened?
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html
2