$24
Please read these instructions to ensure you receive full credit on your homework. Submit the written portion of your homework as a single PDF le through Courseworks (less than 5MB). In addition to your PDF write-up, submit all code written by you in their original extensions through Courseworks (e.g., .m, .r, .py, etc.). Any coding language is acceptable, but your code should be your own. Do not submit Jupyter or other notebooks, but the original source code only. Do not wrap your les in .rar, .zip, .tar and do not submit your write-up in .doc or other le type. Your grade will be based on the contents of one PDF le and the original source code. Additional les will be ignored. We will not run your code, so everything you are asked to show should be put in the PDF le. Show all work for full credit.
Late submission policy: Late homeworks will have 0.1% deducted from the nal grade for each minute late. Your homework submission time will be based on the time of your last submis-sion to Courseworks. Therefore, do not re-submit after midnight on the due date unless you are con dent the new submission is signi cantly better to overcompensate for the points lost. You can resubmit as much as you like, but each time you resubmit be sure to upload all les you want graded! Submission time is non-negotiable and will be based on the time you submitted your last le to Courseworks. The number of points deducted will be rounded to the nearest integer.
Problem 1. (40 points)
In this homework, you will implement an EM algorithm for the object recommendation problem discussed in class. Here, we have a data set of the form R = frij g restricted to a subset of pairs (i; j) 2 , where i can range from 1; : : : ; N and j can range from 1; : : : ; M and we let rij 2 f 1g. The goal of matrix factorization is to nd a low rank approximation of this data.
To this end, let the unknown model variables be ui 2 Rd and vj 2 Rd for i = 1; : : : ; N and
j = 1; : : : ; M. Let U = fuig and V = fvjg. The model and priors are,
ind
iid
iid
rijjU; VBernoulli( (uiT vj= )); for all (i; j) 2 ;
ui N(0; cI);
vj N(0; cI):
The function ( ) is the CDF of a standard normal random variable, as discussed in class. (Hint:
You will nd the probit regression discussion useful for this homework.)
The goal of this homework will be to derive and implement an EM algorithm for maximizing
ln p(R; U; V ) =
Z q( ) ln
Rq( )
d
+
Z q( ) ln p( jR; U; V )d ;
p( ; U; V; )
q( )
|
{z
}
L(
U;V )
where the auxiliary variable is used in the probit setup similar to that discussed in class for probit regression: = f ijg for (i; j) 2 and rij = sign( ij), ij Normal(uTi vj; 2).1
1Note: In the data, rij 2 f 1; +1g instead of f0; 1g. This won’t impact the nal derivation.
1
Problem 1 will focus on deriving the algorithm. To this end:
Q
a) Derive q( ) for the EM algorithm. Hint: You will be able to show q( ) = (i;j)2 q( ij).
b) Derive L(U; V ). You can use the Eq[ ] notation and write the relevant expectation separately.
c) Derive the U; V that optimize L(U; V ). Hint: Derive for one (and therefore all) ui and vj.
d) Summarize the nal EM algorithm in a list of steps that are easy to read, yet detailed enough for someone to be able to implement it. (Be sure to write out ln p(R; U; V ).)
Problem 2. (35 points)
In this problem, you will implement the EM algorithm you derived in Problem 1 for nding a local optimal solution to ln p(R; U; V ). The data is provided on Courseworks along with a README le explaining the data. For your implementation, set d = 5, c = 1 and 2 = 1. Treat the users as U and the movies as V when implementing the algorithm.2
You will need to initialize your algorithm in some way. One possibility is to start by generating each ui and vj from Normal(0; 0:1I). Please use this method for this assignment.
a) Run your algorithm for 100 iterations and plot ln p(R; U; V ) for iterations 2 through 100.
b) Rerun your algorithm for 100 iterations using 5 di erent random starting points. Plot the 5 di erent objective functions for iterations 20 through 100. Note: This is simply a repeat of Problem 2(a), only showing 5 objective functions instead of one and changing the x-axis.
c) Predict the values given in the test set and show your results in a confusion matrix. Show the raw counts in this confusion matrix.
• Hint: Depending how you implement it, your code may be very fast or very slow. You may want to change the data format so you aren’t looping over the list of ratings. Also, the EM algorithm will work if you change around the orders: e.g., update q( ), then update U, then update q( ), then update V . We didn’t present EM this way|we presented it as doing \E" for everything then \M" for everything in one iteration|but it may help develop your understanding of EM if you think through why this suggestion also works. Your code will work if you ignore this hint, but it may end up being slow, and this hint is probably not the only way to faster code.
2