Machine Learning Homework 1 Solution

Starting from:

$30

Home

Problem 1 (written) – 25 points

Imagine we have a sequence of N observations (x1; : : : ; xN ), where each xi 2 f0; 1g. We model this sequence as i.i.d. random variables from a Bernoulli distribution with unknown parameter 2 [0; 1] and known parameter, where

p(xij ) = xi (1 )1 xi

(a) What is the joint likelihood of the data (x1; : : : ; xN )?

(b) Derive the maximum likelihood estimate ^ML for .

To help learn , you use a prior distribution. You select the distribution p( ) = beta(a; b).

(c) Derive the maximum a posteriori (MAP) estimate ^MAP for ?

(d) Use Bayes rule to derive the posterior distribution of and identify the name of this distribution.

(e) What is the mean and variance of under this posterior? Discuss how it relates to ^ML and ^MAP.

Problem 2 (coding) – 35 points

In this problem you will analyze data using the linear regression techniques we have discussed. The goal of the problem is to predict the miles per gallon a car will get using six quantities (features) about that car. The zip file containing the data can be found on Courseworks.1 The data is broken into training and testing sets. Each row in both “X” files contain six features for a single car (plus a 1 in the 7th dimension) and the same row in the corresponding “y” file contains the miles per gallon for that car.

Remember to submit all original source code with your homework. Put everything you are asked to show below in the PDF file.

Part 1. Using the training data only, write code to solve the ridge regression problem

L = kwk2 + P350i=1 kyi xTi wk2:

(a) For = 0; 1; 2; 3; : : : ; 5000, solve for wRR. (Notice that when = 0, wRR = wLS.) In one figure, plot the 7 values in wRR as a function of df( ). You will need to call a built in SVD function to do this (all details are in the slides). Be sure to label your 7 curves by their dimension in x.

(b) The 4th dimension (car weight) and 6th dimension (car year) clearly stand out over the other dimensions. What information can we get from this?

(c) For = 0; : : : ; 50, predict all 42 test cases. Plot the root mean squared error (RMSE)2 on the test set as a function of —not as a function of df( ). What does this figure tell you when choosing for this problem (and when choosing between ridge regression and least squares)?

Part 2. Modify your code to learn a pth-order polynomial regression model for p = 1; 2; 3. (You’ve already done p = 1 above.) For this implementation, do not include the cross terms for this problem, but instead use the method discussed in the slides.

(d) In one figure, plot the test RMSE as a function of = 0; : : : ; 500 for p = 1; 2; 3. Based on this plot, which value of p should you choose and why? How does your assessment of the ideal value of change for this problem?

1See https://archive.ics.uci.edu/ml/datasets/Auto+MPG for more details on this dataset. Since I have done some preprocessing, you must use the data provided with this homework.
2
RMSE =
q
42
Pi=1
(yi
yi )2
:

1

42
test
pred