Homework #2 Machine Learning Solution

Starting from:

~~$30~~

$24

Home

You may complete this homework assignment either individually or in teams up to 2 people.

1. Age regression: Train an age regressor that analyzes a (48 × 48 = 2304)-pixel grayscale face image and outputs a real number yˆ that estimates how old the person is (in years). Your regressor should be implemented using linear regression. The training and testing data are available here:

• https://s3.amazonaws.com/jrwprojects/age_regression_Xtr.npy

• https://s3.amazonaws.com/jrwprojects/age_regression_ytr.npy

• https://s3.amazonaws.com/jrwprojects/age_regression_Xte.npy

• https://s3.amazonaws.com/jrwprojects/age_regression_yte.npy

Note: you must complete this problem using only linear algebraic operations in numpy – you may not

use any off-the-shelf linear regression software, as that would defeat the purpose.

(a) One-shot (analytical) solution [20 points]: Compute the optimal weights w = (w1 , . . . , w2304 ) and bias term b for a linear regression model by deriving the expression for the gradient of the cost function w.r.t. w and b, setting it to 0, and then solving. The cost function is

1 n

fMSE (w, b) = X

2n

i=1

(yˆ(i)

− y(i) )2

where yˆ = g(x; w, b) = x w + b and n is the number of examples in the training set

Dtr = {(x(1) , y(1) ), . . . , (x(n) , y(n) )}, each x(i) ∈ R2304 and each y(i) ∈ {0, 1}. After optimizing w and b only on the training set, compute and report the cost fMSE on the training set Dtr and (separately) on the testing set Dte . Suggestion: to solve for w and b simultaneously, use the trick shown in class whereby each image (represented as a vector x) is appended with a constant
1 term (to yield an appended representation x˜). Then compute the optimal w˜

original w and an appended b term) using the closed formula:

w˜ = X˜ X˜ −1 X˜ y

(comprising the

For appending, you might find the functions np.hstack, np.vstack, np.atleast 2d useful. After
optimizing w˜

and b (using f˜MSE ), compute and report the cost fMSE on the training set Dtr and
(separately) the testing set Dte .

(b) Gradient descent [25 points]: Pick a random starting value for w ∈ R2304 and b ∈ R and a small learning rate (e.g., = .001). (In my code, I sampled each component of w and b from a Normal distribution with standard deviation 0.01; use np.random.randn). Then, using the expression for the gradient of the cost function, iteratively update w, b to reduce the cost fMSE (w, b). Stop after conducting T gradient descent iterations (I suggest T = 5000 with a step size (aka learning rate) of = 0.003). After optimizing w and b only on the training set, compute and report the cost fMSE on the training set Dtr and (separately) on the testing set Dte . After optimizing w and b (using f˜MSE ), compute and report the cost fMSE on the training set Dtr and (separately) the testing set Dte .

(c) Regularization [15 points]: Same as (b) above, but change the cost function to include a penalty for |w|2 growing too large:

f˜MSE (w) =

n

−

1 X(yˆ(i) y(i) )2 +

2n

i=1

α

w w

2n

where α ∈ R+ . Set α = 1.0 (this worked well for me) and then optimize f˜MSE w.r.t. w and b. After optimizing w and b (using f˜MSE ), compute and report the cost fMSE (without the L2 term) on the training set Dtr and (separately) the testing set Dte . Important: the regularization should be applied only to the w, not the b. I suggest a regularization strength of α = 0.1.

(e) Visualizing the machine’s behavior [10 points]: After training the regressors in parts (a), (b), and (c), create a 48 × 48 image representing the learned weights w (without the b term) from each of the different training methods. Use plt.imshow(). How are the weight vectors from the different methods different? Next, using the regressor in part (c), predict the ages of all the images in the test set and report the RMSE (in years). Then, show the top 5 most egregious errors, i.e., the test images whose ground-truth label y is farthest from your machine’s estimate yˆ. Include the images, along with associated y and yˆ values, in a PDF. 4

Submission: Put your solution in a Python file called homework2 WPIUSERNAME.py

(or homework2 WPIUSERNAME1 WPIUSERNAME2.py for teams), and show the most egregious errors for part (a)

in homework2 errors WPIUSERNAME.pdf.