$39
1. (10 points) Exercise 3.6 (page 92) in LFD.
2. (20 points) Recall the objective function for linear regression can be expressed as
E(w) = N1 kXw yk2;
as in Equation (3.3) of LFD. Minimizing this function with respect to w leads to the optimal
• as (XT X) 1XT y. This solution holds only when XT X is nonsingular. To overcome this problem, the following objective function is commonly minimized instead:
E2(w) = kXw yk2 + kwk2;
where > 0 is a user-speci ed parameter. Please do the following:
(a) (10 points) Derive the optimal w that minimize E2(w).
(b) (10 points) Explain how this new objective function can overcome the singularity problem of XT X.
3. (35 points) In logistic regression, the objective function can be written as
N
E(w) = N1 X ln 1 + e ynwT xn :
n=1
Please
(a) (10 points) Compute the rst-order derivative rE(w). You will need to provide the intermediate steps of derivation.
(b) (10 points) Once the optimal w is obtain, it will be used to make predictions as follows:
(
1
if (wT x) < 0:5
Predicted class of x =
1
if (wT x) 0:5
where the function (z) =
1
looks like
1+e
z
1
Explain why the decision boundary of logistic regression is still linear, though the lin-ear signal wT x is passed through a nonlinear function to compute the outcome of prediction.
(c) (5 points) Is the decision boundary still linear if the prediction rule is changed to the following? Justify brie y.
(
1 if (wT x) 0:9
Predicted class of x =
1 if (wT x) < 0:9
(d) (10 points) In light of your answers to the above two questions, what is the essential property of logistic regression that results in the linear decision boundary?
4. (35 points) Logistic Regression for Handwritten Digits Recognition: Implement lo-gistic regression for classi cation using gradient descent to nd the best separator. The handwritten digits les are in the \data" folder: train.txt and test.txt. The starting code is in the \code" folder. In the data le, each row is a data example. The rst entry is the digit label (\1" or \5"), and the next 256 are grayscale values between -1 and 1. The 256 pixels correspond to a 16 16 image. You are expected to implement your solution based on the given codes. The only le you need to modify is the \solution.py" le. You can test your solution by running \main.py" le. Note that code is provided to compute a two-dimensional feature (symmetry and average intensity) from each digit image; that is, each digit image is represented by a two-dimensional vector before being augmented with a \1" to form a three-dimensional vector as discussed in class. These features along with the corresponding labels should serve as inputs to your logistic regresion algorithm.
(a) (15 points) Complete the logistic regression() function for classifying digits number \1" and \5".
(b) (5 points) Complete the accuracy() function for measuring the classi cation accuracy on your training and test data.
(c) (5 points) Complete the thirdorder() function to transfer the features into 3rd order polynomial Z-space.
(d) (10 points) Run \main.py" to see the classify results. As your nal deliverable to a customer, would you use the linear model with or without the 3rd order polynomial transform? Brie y explain your reasoning.
Deliverable: You should submit (1) a hard-copy report (along with your write-up for other questions) that summarizes your results and (2) the \solution.py" le to the Blackboard.
Note: Please read the \Readme.txt" le carefully before you start this assignment. Please do NOT change anything in the \main.py" and \helper.py" les when you program.
2