Machine Learning Homework 3 Solution

Starting from:

~~$30~~

$24

Home

Problem 1 (Gaussian process coding) – 30 points

In this problem you will implement the Gaussian process model for regression. You will use the same data used for homework 1 to do this, which is again provided in the data zip file for this homework. Recall that the Gaussian process treats a set of N observations (x1; y1); : : : ; (xN ; yN ), with xi 2 Rd and yi 2 R, as being generated from a multivariate Gaussian distribution as follows,

y N ormal(0; 2I + K); Kij = K(xi; xj)
use: exp n
1
kxi
xjk2
o :

b

Here, y is an N-dimensional vector of outputs and K is an N N kernel matrix. For this problem use the Gaussian kernel indicated above. In the lecture slides, we discuss making predictions for a new y0 given x0 , which was Gaussian with mean (x0 ) and variance (x0 ). The equations are shown in the slides.

There are two parameters that need to be set for this model as given above, 2 and b.

a) Write code to implement the Gaussian process and to make predictions on test data.

b) For b 2 f5; 7; 9; 11; 13; 15g and 2 2 f:1; :2; :3; :4; :5; :6; :7; :8; :9; 1g—so 60 total pairs (b; 2)— calculate the RMSE on the 42 test points as you did in the first homework. Use the mean of the Gaussian process at the test point as your prediction. Show your results in a table.

c) Which value was the best and how does this compare with the first homework? What might be a drawback of the approach in this homework (as given) compared with homework 1?

d) To better understand what the Gaussian process is doing through visualization, re-run the algo-rithm by using only the 4th dimension of xi (car weight). Set b = 5 and 2 = 2. Show a scatter plot of the data (x[4] versus y for each point). Also, plot as a solid line the predictive mean of the Gaussian process at each point in the training set. You can think of this problem as asking you to create a test set by duplicating xi[4] for each i in the training set and then to predict that test set.

Problem 2 (Boosting coding) – 30 points

In this problem you will implement boosting for the “least squares” (LS) classifier that we briefly dis-cussed in Lecture 8. Recall that this “classifier” performed least squares linear regression treating the 1 labels as real-valued responses. Also recall that we criticized this classifier as being “weak,” without using that word, and so boosting this classifier can be a good illustration of the method (even though it performs well on the data set you will be using).

Using the training data provided, implement boosting for the LS classifier. You should use the bootstrap method as discussed in the slides to do this, where each bootstrap set Bt is the size of the training set. Recall that if your error t > 0:5, you can simply change the sign of the regression vector w (including the intercept) and recalculate the error.

Information about the data used for this problem can be found here:

https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+

but you must use the data provided on Courseworks. Note that the intercept dimension hasn’t been included in the features provided, so you should add a dimension equal to 1.

a) Run your boosted LS classifier for T = 1500 rounds. In the same plot, show the training and testing error of fboost(t)( ) for t = 1; : : : ; T .

b) In a separate plot, show the upper bound on the training error as a function of t. You will need to use t to do this. This upper bound is given in the slides for Lecture 13.

c) Plot a histogram of the total number of times each training data point was selected by the bootstrap method across all rounds. In other words, sum the histograms of all Bt.
d) In two separate plots, show t and t as a function of t.