Starting from:
$30

$24

Machine Learning Homework 4 Solution


Problem 1 (K-means) – 20 points

Implement the K-means algorithm discussed in class. Generate 500 observations from a mixture of three Gaussians on R2 with mixing weights = [0:2; 0:5; 0:3] and means and covariances ,
1
=
0
;  1
=
0
1
;
2
=
0
;  2 =
0
1
;
3
=
3
;  3
=
0
1
:


0


1
0



3

1
0



0


1
0





    a) For K = 2; 3; 4; 5, plot the value of the K-means objective function per iteration for 20 iterations (the algorithm may converge before that).

    b) For K = 3; 5, plot the 500 data points and indicate the cluster of each for the final iteration by marking it with a color or a symbol.


Problem 2 (Matrix factorization) – 40 points

In this problem, you will implement the MAP inference algorithm for the matrix completion problem discussed in class. As a reminder, for users u 2 Rd and movies v 2 Rd, we have
ui    N(0;    1I);    i = 1; : : : ; N1;    vj    N(0;    1I);    j = 1; : : : ; N2:

We are given an N1 N2 matrix M with missing values. Given the set = f(i; j) : Mij is measuredg, for each (i; j) 2 we model Mij N(uTi vj; 2).

You should run your code on the user-movie ratings dataset provided on Courseworks and the course website. For your algorithm, set 2 = 0:25, d = 10 and = 1. Train the model on the larger training set for 100 iterations. For each user-movie pair in the test set, predict the rating using the relevant dot product. Note that the mean rating has been subtracted from the data and you do not need to round your prediction. Since the equations are in the slides, there’s no need to re-derive it.

    a) Run your code 10 times. For each run, initialize your ui and vj vectors as N(0; I) random vectors. On a single plot, show the the log joint likelihood for iterations 2 to 100 for each run. In a table, show in each row the final value of the training objective function next to the RMSE on the testing set. Sort these rows according to decreasing value of the objective function.

    b) For the run with the highest objective value, pick the movies “Star Wars” “My Fair Lady” and “Goodfellas” and for each movie find the 10 closest movies according to Euclidean distance using their respective locations vj. List the query movie, the ten nearest movies and their distances. A mapping from index to movie is provided with the data.

More products