$29
Introduction
With the data given and looking at Figure 1, we can clearly see that the Y values come from a mixture distribution where the mean of the smaller values appears to be around 1 = 110, and the mean of the larger values is close to 2 = 150. Therefore, when we come up with our prior we will use these values as the prior means. Also, looking at the kernel density line, we can clearly tell that the variance of the smaller distribution is greater than the larger distribution, so we will use values of 12 = 100, and 22 = 64. Judging by how similar the peaks of both the distributions are, I believe that an equal number observations come from each distribution, and my priors will re ect this.
Figure 1: Histogram of Data Given
Priors
p
beta(a; b)
Xi
Bernoulli(p)
Yi
N( 1; 12); if Xi = 1
Yi
N( 2; 22); if Xi = 0
j
N( 0; 02); j = 1; 2
1= j2Gamma( 0=2; 0 02=2)
The joint distribution for all of the variables is
2
2
~ ~
2
2
~ ~
p( 1; 2; 1
; 2
; p; XjY ) / p( 1; 2; 1
; 2
; p; X; Y )
~
~
2
2
2
~
= p(Y jX; 1; 2; 1
; 2 )
p( j) p(1= j ) p(p) p(Xjp):
Sampling Distribution
The sampling distribution for the Y values can be given by
n
Y
~ j ~ 2 2 j 2 2
p(Y X; 1; 2; 1 ; 2 ; p) = p(Yi Xi; 1; 2; 1 ; 2 )
i=1
n
Y
dnorm(yi; 1; 12)xi dnorm(yi; 2; 22)1 xi ;
i=1
where dnorm(yi; 1; 12) signi es the pdf of a N( 1; 12)
Full Conditionals
The full conditional distribution of Xi can be given by
P (X
= 1
Y
;
;
; 2
; 2
; p)
/
P (Xi = 1jp) p(Yijxi = 1; 1; 12)
p(Yijxi; 1; 2; 12; 22)
i
j i
1
2
1
2
=
p p(Yijxi = 1; 1; 12)
p p(Yijxi = 1; 1; 12) + (1 p) p(Yijxi = 0; 2; 22)
=
p dnorm(yi; 1; 12)
:
p dnorm(yi; 1; 12) + (1 p) dnorm(yi; 2; 22)
Therefore, we can conclude that
Xi
yi; 1; 2
; 2
; 2; p
Bernoulli
p dnorm(yi; 1; 12)
p dnorm(yi; 1; 12) + (1 p) dnorm(yi; 2; 22)
j
1
2
Now, nding the full conditional distribution of p, we get
~
~
2
2
p(pjY ; X; 1; 2; 1
; 2 ) / Joint pdf of all from above
~
/ p(p) p(Xjp)
n
a + b)
iY
=
a) b)
pa 1(1 p)b 1
px(1 p)1 x
=1
/ pPxi+a 1(1p)n Pxi+b 1:
Looking at this result, we can conclude that p comes from a beta distribution,
given by
pjY~ ; X;~ 1; 2; 12; 22 beta X xi + a; n
X xi + b :
Now to nd the full conditional for 1.
~ ~
2
2
/ Joint pdf of all variables
p( 1jY ; X; p; 2; 1 ; 2 )
~ ~
2
2
/ p(Y jX; 1; 2; 1
; 2 ) p( 1)
n
Y
dnorm(yi; 1; 12)xi dnorm(yi; 2; 22)1 xi dnorm( 1; 0; 02)
i=1
n
Y
dnorm(yi; 1; 12)xi dnorm( 1; 0; 02)
/
i=1
X
2 02
2 12
exp
1
( 1
0)2
1
(yi
1)2
:
Then, ignoring the
1=2 in front, inside the expfg we have
1
(12 210
+ 02)+
1
(
X yi2
2 1
X yi + n 12) = a 12
2b 1 + c; where
02
12
1
0
0
P
1
a =
1
+
n
; b =
0
+
yi
;
and c = c( 0
; 2
; 2
; ~y):
2
2
2
2
0
1
Now putting it all together, we get
~ ~
1
2
2
2
p( 1jY ; X; p; 2
; 1
; 2 ) / exp
2
(a 1
2b 1)
=a
= exp
2 a( 12
2b 1
=a + b2=a2) + 2 b2
1
1
/ exp
1
b=a)2
2 a( 1
)
1
b=a
2
= exp
11=p
:
2
a
This function follows the shape of a normal distribution where the mean is b=a,
and the variance is 1=a. Therefore, if Y1 = fYi
: Xi = 1g, n1 is the size of Y1,
and y1 is the mean of values in Y1, we can conclude
~ ~
2
2
2
where
1jY ; X; p; 2
;1;2
N( n;1; n;1);
n;2
=
1
=
1
;
and
1
1
n1
a
+
0
2
1
n;1
= a =
02
n;2
1:
+12
b
0
n1y1
Without loss of generality, we can follow a similar process to come up with the distribution for 2, and will end up with
~ ~
2
2
2
where
2jY ; X; p; 1
; 1
; 2
N( n;2; n;2);
n;22
=
1
; and
1
+
n2
0
2
2
n;2
=
02
+ 22
n;22;
0
n2y
where Y2 = fYi : Xi = 0g, n2 is the size of Y2, and y2 is the mean of values in Y2.
Finally, we must nd the full conditional distribution of 12 and 22. Since we know the prior distribution of 1= 12, we will nd the full conditional based
4
on the precision. Therefore,
2 ~ ~
2
p(1= 1 jY ; X; p; 1
; 2; 2 ) / Joint pdf of all variables
~ ~
2
2
/ p(Y jX; p; 1; 2
; 2 )
p(1= 1 )
=2
)!
/ (~12)n1=2exp ( ~12
(yi 1)2
n1
X
i=1
(~12) 0=2 1exp ~12 0 02=2
n h i o
X
= (~12)( 0+n1)=2 1 exp ~12 0 02 + (yi 1)2 =2 ;
where all the yi values have corresponding xi values that are 1. This function takes the shape of a gamma distribution, therefore,
2
2
n;1
n21 n;1
1= 1 jY~ ; X;~ p; 1
;2;2
Gamma
;
; where
2
2
n;1
= 0 + n1;
and;
2
1
2
+ n
s2
(
) ; where,
= n;1 0 0
2
n;1
2
1
n1
1
sn;1
( 1) = X(yi
1)
=n1:
Again, without loss of generality, we can follow the same logic to come up with the full conditional distribution for 1= 22, which will end up being,
2
2
n;2
n22 n;2
1= 2 jY~ ; X;~ p; 1
;2;1
Gamma
;
; where
2
2
n;2
= 0 + n2;
and;
2
1
2
+ n
s2
(
) ; where,
= n;2 0 0
2
n;2
2
2
n2
2
sn;2
( 2) = X(yi
2)
=n2:
Gibb’s Sampler
I implemented a Gibb’s Sampler next with values of a = b = 1, because I didn’t really know anything about the probability of people from the low mean distribution versus the high mean distribution, 0 = 140, because it was the mean of the entire set of y values, 02 = 625, 02 = 625, because they seemed like reasonable numbers for the variances, and 0 = 5. I then used the full conditional distributions from above and ran the Gibb’s Sampler using 100,000
iterations.
Diagnostics
Looking to Figure 3 for the means and 95% Con dence Intervals for all of the parameters of interest, our initial guesses for the j values look to be pretty close, but our original guesses for the j2 weren’t that close at all. It also appears as if about 46:7% of people come from the distribution with smaller mean, and the rest are from the larger distribution.
Parameter
2:5%
50%
97:5%
1
108:87
11:79
112:70
2
143:10
146:43
149:12
12
106:65
129:55
158:89
2
187:83
236:89
306:44
2
p
0:393
0:467
0:529
Figure 2: Means and 95% Con dence Intervals
From both Figure 2 and Figure 3, one can see that 2 has a wider spread in 2 when compared to 1. This is another consequence of there being a signi cant amount of extreme values one the high end of the original data set, making 2 vary more.
~
According to the Gibb’s Sampler, if we take a sample of Y from the corre-sponding j and j, we get a similar looking kernel density which can be seen in Figure 4. The shape of the distribution with the smaller mean from the orig-inal sample looks pretty similar to the mixture model, however the distribution
Figure 3: Posterior Density of 1 and 2
with the larger mean doesn’t quite match up. That’s because when we run the Gibb’s Sampler, 2 = 15:45, which is quite high. I think this happens because there are a couple extreme values on the high end of the original sample which might be in uencing the variance of the higher mean distribution.
Figure 4: Kernel Densities
I also found that, given an individual whose y value is 120 has a 78% chance of coming from the distribution with with a smaller mean.
Looking at the auto correlation functions in Figure 5, we can see that the j values seem to be quite correlated, but as time goes on, they eventually become uncorrelated. This is why I did such a large number of iterations with the Gibb’s Smapler, so I could get large enough e ective samples sizes for both of the j values. Those e ective sample sizes were 5111:7 for 1 and 3952:5 for 2.
Figure 5: Kernel Densities
7