Starting from:
$29.99

$23.99

Homework #7 Solution

Instructions:  Please put  all answers in a single PDF  with your name and NetID and upload  to  SAKAI before class on the  due  date  (there  is a  LaTeX  template  on the course web site for you to use).  Definitely consider working in a group; please include the names of the people in your group and write up your solutions separately.   If you look at any references (even wikipedia), cite them.  If you happen  to track  the number of hours you spent on the homework, it would be great if you could put that  at the top of your homework to give us an indication  of how difficult it was.

 

 

Problem 1

 

Expectation  Propagation

 

(a)  Consider  a factorized  approximation of q(x, y)  = q(x)q(y)  for joint  distribution p(x, y). Minimize, with respect to q(x) (and,  identically,  for q(y), no need to write both  out),  the  forward  KL divergence,  K L(p || q).  (Hint:   use the  standard ap- proach  of taking  the derivative  of our  new variational  objective  with  respect  to q(x) and setting  to zero.  When you get stuck,  remember  that  you should include

the constraint that  Px q(x) = 1 in the form of a Lagrange multiplier).

 

(b)  Write  two sentences  that  capture  the  variational  updates  implied  from this  ap- proach.

 

 

Problem 2

 

Dirichlet distribution

In a number of the models we are about to look at in class, including the latent Dirichlet allocation (LDA) model and the Dirichlet process mixture model, the parameterization of a Dirichlet, α, plays a substantial role in the clustering effect of the model. Recall that the Dirichlet distribution is the conjugate prior for the parameters of a multinomial.  In particular, let

 



k
 
πj     =  Dir(α) zi,j     =  M ult(πj ) xi,j     =   N (µk , σ2).

 

For each object j = 1, · · · , p, we generate  object-specific mixture  proportions  πj , and, from this,  we generate  an element-specific class zi,j  and  univariate  normal  value xi,j , where i = 1, · · · , n.

 

(a)  When α = [α1 , · · · , αK ] are all 0.2, what does this imply about  the resulting  clus- tering of elements within each object j?

 

(b)  When  α = 1, what  does this  imply about  the  clustering  of elements  within  each object j?

 

(c)  When α = 10, what  does this imply about  the clustering  of elements within each object j?

 

(d)  For  K  = 2,  a  Dirichlet  distribution is captured   in  a  beta  distribution.   Plot  a histogram  of samples  from a beta(0.2, 0.2),  beta(1, 1),  and  beta(10, 10).   How do these support  your answers above?

 

(e)  Looking at  these  histograms,  do you have  an  intuition  for the  settings  of α  for which inference in these models is easier? What  is that  setting, and why might you suspect that  inference is easier?

 

 

Problem 3

 

Project  progress

Write a few sentences on your work this week on your course project.  What  roadblocks have  you hit?   Are there  any  questions  about  specific approaches?    Did you get  an exciting result you can share with us? Tell us about  it.

More products