$24
This assignment concerns the scenario considered in the slides, where a coin Z is tossed to choose between one of two other coins A and B. The chosen coin is then tossed N times. The whole procedure is repeated D times.From the complete data set it would be easy to determine the probabilities of Z indicating A, and for each of A and B, their probabilities of turning up heads. The hidden variable variant is where as data you just know on each trial what the N coin tosses yielded with the chosen coin, but you don’t know which coin was being tossed: the outcome on
is hidden.
Suppose the following data set, where there are just 2 trials, and the chosen coin is tossed just 2 times (this was considered in the slides):
Z tosses of chosen coin
1
?
H
H
2
?
T
T
Below there is a detailed working through of a first iteration of the EM procedure for estimating parameters applied to this. For the assignment you have to give a similar working through of the second iteration.
Suppose the following notation
θA
θB
θH|A θT|A θ
θ
#(d, h)
#(d, t)
P (Z = a)
P (Z = b)
P (h|a) ie. the head prob of coin A
P (t|a) ie. the tail prob of coin A
P (h|b) ie. the head prob of coin B
P (t|b) ie. the tail prob of coin B
num of heads in trial d
num of tails in trial d
If XD is a particular trial – ie. outcomes of the N tosses of a chosen coin – the probability of the version where the chosen coin was A is given by
P (Z = a, XD) = P (Z = a) × P (h|a)#(D,H) × P (t|a)#(D,T)
θA × θH#(|AD,H) × θT#(|AD,T)
and likewise the probability of the version where the chosen coin was B is given by
P (Z = b, XD) = P (Z = b) × P (h|b)#(D,H) × P (t|b)#(D,T)
= θB × θH#(|BD,H) × θT#(|BD,T)
1
From these joint probability formula can work out the conditional probabilities for the hidden variable:
P (Z = a|XD)
=
P (Z = a, XD)
C
P (Z = c, XD)
P (Z = b|XD)
=
P (Z = b, XD)
C
P (Z = c, XD)
In the slides we used the notation γD(Z) for this, where d is index of the data item.
On the particular data set at hand the joint probability formulae are particularly simple
P (Z = a, X1) = θA × θH2|A
P (Z = b, X1) = θB × θH2|B
P (Z = a, X2) = θA × θ2|
T A
P (Z = b, X2) = θB × θ2|
T B
and thus the conditional probalities are:
θA
× θ2
γ1(a)
=
H|A
θA×θ2
+ θB
× θ2
H|A
H|B
θB
× θ2
γ1(b)
=
H|B
θA×θ2
+ θB
× θ2
H|A
H|B
θA×θ2
γ2(a)
=
T|A
θA×θ2
+ θB
× θ2
T|A
T|B
θB×θ2
γ2(b)
=
T|B
θA×θ2
+ θB
× θ2
T|A
T|B
To carry out an EM estimation of the parameters given the data we need some initial setting of the parameters. We will suppose this is:
θA = 12, θB = 12,
θ =3,θ =1
H|A 4 T|A 4
ITERATION 1
For each piece of data have to first compute the conditional probabilities of the hidden variable given the data:
= 1 : p(Z = A, HH) = 0.5 × 0.75 × 0.75 = 0.28125
= 1 : p(Z = B, HH) = 0.5 × 0.5 × 0.5 = 0.125
= 1 :→ sum = 0.40625
= 1 :→ γ1(A) = 0.692308
= 1 :→ γ1(B) = 0.307692
= 2 : p(Z = A, T T ) = 0.5 × 0.25 × 0.25 = 0.03125
= 2 : p(Z = B, T T ) = 0.5 × 0.5 × 0.5 = 0.125
= 2 :→ sum = 0.15625
2
= 2 :→ γ2(A) = 0.2
= 2 :→ γ2(B) = 0.8
Armed with these γ values we now treat each data item XD as if it splits into two versions, one filling out Z as a, and with ’count’ γD(a), and one filling out Z as b, and with ’count’ γD(b).
We then go through this virtual corpus accumulating counts of certain kinds of event. For events of hidden variable being Z = a and Z = b we get
E(A) = γ1(a) + γ2(a) = 0.692308 + 0.2 = 0.892308
E(B) = γ1(b) + γ2(b) = 0.307692 + 0.8 = 1.10769
Then we need to go through the Z = a cases and count types of coin toss, and likewise for Z = b cases
E(A, H) = D γD(a)#(d, h) = (0.692308 × 2 + 0.2 × 0) = 1.38462
E(A, T ) = D γD(a)#(d, t) = (0.692308 × 0 + 0.2 × 2) = 0.4
E(B, H) = D γD(b)#(d, h) = (0.307692 × 2 + 0.8 × 0) = 0.615385
E(B, T ) = D γD(b)#(d, t) = (0.307692 × 0 + 0.8 × 2) = 1.6
Then from these ’expected’ counts we re-estimate parameters
est(θA) = E(A)/2 = 0.892308/2 = 0.446154
est(θB) = E(B)/2 = 1.10769/2 = 0.553846
est(θH|A) = E(A, H)/
X [E(A, X)] = 1.38462/(1.38462 + 0.4) = 1.38462/1.78462 = 0.775862
est(θT|A) = E(A, T )/
X [E(A, X)] = 0.4/(1.38462 + 0.4) = 0.4/1.78462 = 0.224138
est(θH|B) = E(B, H)/
X [E(B, X)] = 0.615385/(0.615385+1.6) = 0.615385/2.21538 = 0.277778
est(θT|B) = E(B, T )/
X [E(B, X)] = 1.6/(0.615385 + 1.6) = 1.6/2.21538 = 0.722222
Note the denominator 2 in the re-estimation formula for θA. inator as E(A) + E(B), but this is D γD(a) + D γD(b) =
We could have written the denom-D[γD(a) + γD(b)] = D[1] = 2
3