Starting from:
$30

$24

EECS Bayesian Models Homework 4: Solution

Problem Set-up

You are given observations X = fx1; : : : ; xn g where each xi 2 f0; 1; 2; : : : ; 20g. You model this as being generated from a Binomial mixture model of the basic form

xi j ci   Binomial(20; ci );
iid

ci   Discrete( ):

In this homework, you will implement three algorithms for learning this mixture model, one based on maximum likelihood EM, one on variational inference and one on Gibbs sampling where is integrated out. Use the data provided for all experiments.

Problem 1. (30 points)

In this problem, you will derive and implement the EM algorithm for learning maximum likelihood values of and each k for k = 1; : : : ; K. Use c as the model variable being integrated out.

    a) Show your derivation of the E and M steps of the algorithm. You can present your solution for at the same level of detail as in the notes. Summarize the algorithm in pseudo-code.

    b) Implement the EM algorithm and run it for 50 iterations for K = 3; 9; 15. In one gure, plot the log marginal likelihood over iterations 2 to 50 for each K.

    c) For the nal iteration of each model, for integers 0 through 20 plot a scatter plot that has the integer on the x-axis and the index of the most probable cluster on the y-axis. (You will use p(cjx; ) to nd this.) Show this in three separate gures.


1

Problem 2. (35 points)

In this problem, you will implement a variational inference algorithm for approximating the posterior distribution of the Binomial mixture model. As additional priors, let

Dirichlet( );
iid

k   Beta(a; b);

Set    = 1=10, a = 0:5 and b = 0:5. Approximate the full posterior distribution of    ,    and c with
the factorized distribution q( ;  ; c) = q( ) Q
K
Q
n
using variational inference.

k=1 q( k)

i=1 q(ci)

    a) Derive the optimal q( ), q( k) and q(ci). Summarize the VI algorithm in pseudo-code.

    b) Implement the resulting VI algorithm and run it for 1000 iterations for K = 3; 15; 50. In one gure, plot the variational objective function for each K over iterations 2 to 1000.

    c) For the nal iteration of each model, for integers 0 through 20 plot a scatter plot that has the integer on the x-axis and the index of the most probable cluster on the y-axis. (You will use q(c) to nd this.) Show this in three separate gures.


Problem 3. (35 points)

In this problem, you will implement a Bayesian nonparametric Gibbs sampler for a marginalized version of the model in Problem 2 using the ideas from Lecture 10. This means that in theory we use the limit of the prior Dirichlet( =K; : : : ; =K) for K ! 1, but derive a marginal sampler for p( ; cjx) with integrated out. Set = 34 , a = 0:5 and b = 0:5.

    a) Derive the conditional posteriors you will need: p( kjfxi : ci = kg) and p(cijxi; ; c i). Implement the resulting Gibbs sampling algorithm discussed in class and the notes, but ap-plied to this speci c data-generating distribution. Run your algorithm on the data provided for 1000 iterations with the initial number of clusters set to 30.

    b) Plot the number of observations per cluster as a function of iteration for the six most probable clusters. These should be shown as lines on the same plot that never cross; for example the m-th value of the \second" line will be the number of observations in the second largest cluster after completing the m-th iteration. If there are fewer than six clusters in an iteration then set the remaining values to zero.

    c) Plot of the total number of clusters that contain data as a function of iteration.























2

More products