$29
Provide credit to any sources other than the course sta that helped you solve the problems. This includes all students you talked to regarding the problems.
You can look up de nitions/basics online (e.g., wikipedia, stack-exchange,
Submission rules are the same as previous assignments.
Please write your net-id on top of every page. It helps with grading.
Problem 1 (10 points) Di erent class conditional probabilities. Consider a classi cation
problem with features in Rd, and labels in f 1; +1g. Consider the class of linear classi ers of
!
the form ( w ; 0), namely all the classi ers (hyper planes) that pass through the origin (or t = 0).
Instead of logistic regression, suppose the class probabilities are given by the following function,
!
2 R
d are the features:
where X
0
1
P y = !j !
2
!
1
!X
@
w
A
1 +!
!
+1 X ; w
=
1 +
;
(1)
q
( w
X )2
!
!
and X .
where w!X is the dot product between w
!
(X
; y
) for i = 1; : : : ; n.
Suppose we obtain n examples! i
i
1. Show that the log-likelihood function is
!
J( w ) =
n
X
n log 2 +
i=1
log
0
1 +
y!i
!
i
)
@
( w
X
q
1 +!
!
i
)2
( w
X
1
A: (2)
2. Compute the gradient and write one step of gradient ascent. Namely ll in the blank:
! !
w j+1 = w j +
In Problem 2, and Problem 3, we will study linear regression. We will assume in both the problems that w0 = 0. (This can be done by translating the features and labels to have mean zero,
1
!
1
; : : : ; w
d
),
and X = (X 1
; : : : ; X d), the regression we
but we will not worry about it). For w = (w
! !
!
want is:
!
1
1
d
d
= w!X :
(3)
y = w!X
+ : : : + w!X
We considered the following regularized least squares objective, which is called as Ridge Regres-
(X
; y
),
sion. For n examples! i
i
J!
n
! ! i
2
+ !kw k2:
i=1
i
( w ; ) =
X
y
w X
2
Problem 2 (10 points) Gradient Descent for regression.
1. Instead of using the closed form expression we derived in class, suppose we want to perform
!
gradient descent to nd the optimal solution for J( w ). Please compute the gradient of J, and write one step of the gradient descent with step size .
!
2. Suppose we get a new point X n+1, what will the predicted yn+1 be when ! 1?
Problem 3 (15 points) Regularization increases training error. In the class we said that when we regularize, we expect to get weight vectors with smaller, but never proved it. We also displayed a plot showing that the training error increases as we regularize more (larger ). In this
assignment, we will formalize the intuitions rigorously.
be the minimizers of J( w ;
), and
Let 0 < 1 < 2 be two regularizer values.
Let w
, and w
J!
!
1
!
2
!
1
2
) respectively.
( w ;
k
k
22
k
k
22. Therefore more regularization implies smaller norm of solution!
1. Show that!w 1
!w 2
( w
1
;
)
J( w
2
; ), and J( w
;
)
J( w
;
) (why?).
Hint: Observe that J!
1
!
1
! 2
2
! 1
2
for w
is less than that of w
. In other words, show that
2. Show that the training error
! 1
!
2
n
yi !w 1!X i 2
n
yi
!w
2!X i 2:
i=1
i=1
X
X
Hint: Use the rst part of the problem.
Problem 4 (25 points) Linear and Quadratic Regression. Please refer to the Jupyter Note-book in the assignment, and complete the coding part in it! You can use sklearn regression package: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html
2