Pencil-and-paper execution of EM on coin toss Solution

Starting from:

$30

Home

This assignment concerns the scenario considered in the slides, where a coin Z is tossed to choose between one of two other coins A and B. The chosen coin is then tossed N times. The whole procedure is repeated D times.From the complete data set it would be easy to determine the probabilities of Z indicating A, and for each of A and B, their probabilities of turning up heads. The hidden variable variant is where as data you just know on each trial what the N coin tosses yielded with the chosen coin, but you don’t know which coin was being tossed: the outcome on

is hidden.

Suppose the following data set, where there are just 2 trials, and the chosen coin is tossed just 2 times (this was considered in the slides):

Z tosses of chosen coin

1
?
H
H
2
?
T
T

Below there is a detailed working through of a first iteration of the EM procedure for estimating parameters applied to this. For the assignment you have to give a similar working through of the second iteration.

Suppose the following notation

θA

θB

θH|A θT|A θ

θ

#(d, h)

#(d, t)

P (Z = a)

P (Z = b)

P (h|a) ie. the head prob of coin A

P (t|a) ie. the tail prob of coin A

P (h|b) ie. the head prob of coin B

P (t|b) ie. the tail prob of coin B

num of heads in trial d

num of tails in trial d

If XD is a particular trial – ie. outcomes of the N tosses of a chosen coin – the probability of the version where the chosen coin was A is given by

P (Z = a, XD) = P (Z = a) × P (h|a)#(D,H) × P (t|a)#(D,T)

θA × θH#(|AD,H) × θT#(|AD,T)

and likewise the probability of the version where the chosen coin was B is given by

P (Z = b, XD) = P (Z = b) × P (h|b)#(D,H) × P (t|b)#(D,T)

= θB × θH#(|BD,H) × θT#(|BD,T)

1

From these joint probability formula can work out the conditional probabilities for the hidden variable:

P (Z = a|XD)
=
P (Z = a, XD)

C
P (Z = c, XD)

P (Z = b|XD)
=
P (Z = b, XD)

C
P (Z = c, XD)

In the slides we used the notation γD(Z) for this, where d is index of the data item.

On the particular data set at hand the joint probability formulae are particularly simple

P (Z = a, X1) = θA × θH2|A

P (Z = b, X1) = θB × θH2|B

P (Z = a, X2) = θA × θ2|

T A

P (Z = b, X2) = θB × θ2|

T B

and thus the conditional probalities are:

θA
× θ2

γ1(a)
=

H|A
θA×θ2

+ θB
× θ2

H|A

H|B

θB
× θ2

γ1(b)
=

H|B
θA×θ2

+ θB
× θ2

H|A

H|B

θA×θ2

γ2(a)
=

T|A

θA×θ2

+ θB
× θ2

T|A

T|B

θB×θ2

γ2(b)
=

T|B

θA×θ2

+ θB
× θ2

T|A

T|B

To carry out an EM estimation of the parameters given the data we need some initial setting of the parameters. We will suppose this is:

θA = 12, θB = 12,

θ =3,θ =1

H|A 4 T|A 4

ITERATION 1

For each piece of data have to first compute the conditional probabilities of the hidden variable given the data:

= 1 : p(Z = A, HH) = 0.5 × 0.75 × 0.75 = 0.28125

= 1 : p(Z = B, HH) = 0.5 × 0.5 × 0.5 = 0.125

= 1 :→ sum = 0.40625
= 1 :→ γ1(A) = 0.692308
= 1 :→ γ1(B) = 0.307692

= 2 : p(Z = A, T T ) = 0.5 × 0.25 × 0.25 = 0.03125

= 2 : p(Z = B, T T ) = 0.5 × 0.5 × 0.5 = 0.125

= 2 :→ sum = 0.15625

2

= 2 :→ γ2(A) = 0.2
= 2 :→ γ2(B) = 0.8

Armed with these γ values we now treat each data item XD as if it splits into two versions, one filling out Z as a, and with ’count’ γD(a), and one filling out Z as b, and with ’count’ γD(b).

We then go through this virtual corpus accumulating counts of certain kinds of event. For events of hidden variable being Z = a and Z = b we get

E(A) = γ1(a) + γ2(a) = 0.692308 + 0.2 = 0.892308

E(B) = γ1(b) + γ2(b) = 0.307692 + 0.8 = 1.10769

Then we need to go through the Z = a cases and count types of coin toss, and likewise for Z = b cases

E(A, H) = D γD(a)#(d, h) = (0.692308 × 2 + 0.2 × 0) = 1.38462

E(A, T ) = D γD(a)#(d, t) = (0.692308 × 0 + 0.2 × 2) = 0.4

E(B, H) = D γD(b)#(d, h) = (0.307692 × 2 + 0.8 × 0) = 0.615385

E(B, T ) = D γD(b)#(d, t) = (0.307692 × 0 + 0.8 × 2) = 1.6

Then from these ’expected’ counts we re-estimate parameters

est(θA) = E(A)/2 = 0.892308/2 = 0.446154

est(θB) = E(B)/2 = 1.10769/2 = 0.553846

est(θH|A) = E(A, H)/
X [E(A, X)] = 1.38462/(1.38462 + 0.4) = 1.38462/1.78462 = 0.775862
est(θT|A) = E(A, T )/
X [E(A, X)] = 0.4/(1.38462 + 0.4) = 0.4/1.78462 = 0.224138
est(θH|B) = E(B, H)/
X [E(B, X)] = 0.615385/(0.615385+1.6) = 0.615385/2.21538 = 0.277778
est(θT|B) = E(B, T )/
X [E(B, X)] = 1.6/(0.615385 + 1.6) = 1.6/2.21538 = 0.722222

Note the denominator 2 in the re-estimation formula for θA. inator as E(A) + E(B), but this is D γD(a) + D γD(b) =

We could have written the denom-D[γD(a) + γD(b)] = D[1] = 2

3