Starting from:
$29.99

$23.99

Homework #4 Solution

Instructions:  Please put  all answers in a single PDF  with your name and NetID and upload  to  SAKAI before class on the  due  date  (there  is a  LaTeX  template  on the course web site for you to use).  Definitely consider working in a group; please include the names of the people in your group and write up your solutions separately.   If you look at any references (even wikipedia), cite them.  If you happen  to track  the number of hours you spent on the homework, it would be great if you could put that  at the top of your homework to give us an indication  of how difficult it was.

 

 

Problem 1

 

EM and K-means

 

 

(a)  We wish to compute the M-step in an E-M algorithm for a Gaussian mixture model.

Starting  with the expected complete data  log likelihood (auxiliary)  function:

 

n     K                                                                              n     K

Q(θ, θ(t−1) )   =


X X p(zi = k|xi , θ(t−1) ) log[πk ] + X X p(zi = k|xi , θ(t−1) ) log[p(xi |θk )]

i=1 k=1

where  θ = {µk , πk , Σk }


i=1 k=1

(based on Equation  11.26 on page 351 in Murphy’s Book)

 

derive the maximum likelihood estimations  of µk  and πk .

As defined in equations  11.1 and  11.2 on pages 338 − 9 in Murphy’s  book:  each base distribution in the mixture model is a multivariate Gaussian with mean vector

µk  and covariance matrix  Σk . And π are the mixing weights sastifying 0 ≤ πk  ≤ 1



and
 
PK

k=1


πk  = 1.

 

(b)  Compare these maximum likelihood updates for π and µ to that  of K-means (Section

11.4.2.5 on page 352 in Murphy’s book).  How are the updates  similar and how do they differ? Use 3 − 4 sentences  to explain your answer.

 

 

Problem 2

 

Hidden Markov Model

 

Z1                                            Z2                                                                                                   ZT



 

 

 

 

X1                                               X2                                                                                                     XT

 

In this  problem we will be using a first-order  Hidden Markov Model (HMM). Where our latent variables  are  denoted  as Z  ∈  {Z1, ..., ZT } and  our observed  variables  are X  ∈ {X1 , ..., XT } as shown in the  figure above.  Assume we have observed data  D =

{x1, . . . , xn }, where xi ∈ {0, 1}. Assume the emission and transition distributions are:

 

Xt |Zt  ∼ Ber(µZt )

Zt |Zt−1  ∼ Ber(πZt−1 )

 

Note:   that   Zt  and  Zt−1  are  subscript  notations  of the  parameters µ and  π  in  the Bernoulli distributions above.  Our  goals are to estimate  parameters θ = {µ, π}, and also update  our estimates  as we observe more data.

 

(a)  Write  out the observation  likelihood function (P (X |θ)).

 

(b)  Write  out the complete data  log likelihood (`c(θ) = log P (X, Z |θ)).

(c)  Write out the expected complete data  log likelihood(Q(θ, θ t−1) = E[`c(θ)|D, θ t−1]). (d)  Explicitly  write  out  the  maximum  likelihood updates  for π  and  µ (the  M-step),

assuming the E-step is as we discussed in class.

 

(e)  Clearly  outline  and  write  out  the steps  for updating   the  parameters using  the Expectation - Maximization  algorithm  in  pseudocode  (but  you  do  not  need  to implement it).

 

 

Problem 3

 

Conceptual  clustering  algorithm

We have seen in class that  we can use the K-means algorithms to identify K clusters in a dataset. Now consider a case where we are given a data set X ∈ R1000×1  (download from Sakai) and do not  know the  appropriate pre-defined cluster  number,  K .  We propose to cluster the data,  and determine  the number  of clusters  K  from the data,  using the following method:

(Note that  ηk  is the centroid  of cluster  k, and K  is the number of clusters,  each with their own centroid:  η = {η1, ..., ηK })

Adaptive K-means algorithm:

 

1. Initialize  by labeling all of the  data  in one cluster  (all Xi ’s are assigned to one cluster and call η0  is their centroid).  Store η0.

 

2. Within  the  range  of the data  (i.e., based on centroid  η0), randomly  generate  a new centroid,  which we will call ηK +1 .

 

3. Assign each Xi to one of K +1 clusters based on which ηk minimizes the Euclidean distance:

 

 

 

zi = argmink ||xi  − ηk ||2

 

 

4. Keep all centroids, ηk , that  have at least one Xi assigned to it.  Update the value of K based on the total  number of labelled centroids.  Update  the retained  centroids based on assigned data  points, as in the standard K-means algorithm.

 

5. Repeat  Steps 2 − 4 until you have reached a reasonable stopping  place.

 

Question:

 

1. Conceptually,  how many  clusters  do we expect  to have?   In other  words, what value of K should we expect?  Answer this before implementing.

 

2. When  you implement  this  method  in R, how many clusters  do we end up with

(please show code)?

 

3. How do  you  suggest  we modify our  method  to  prevent overfitting?    Try  this approach  in your code and report  the estimated  K . Did it work?

More products