$24
The files: Stat261-Assign5-R-2018.pdf and StudentPerformanceData.pdf are required to complete this
assignment. The Stat261-Assign5-R-2018.Rmd contains the code that generated Stat261-Assign5-R-
2018.pdf. Section numbers in the assignment questions refer to the sections in Stat261-Assign5-R-
2018.pdf.
We will use the variables G3, SEX, and WALC in this data set. Look up the definitions of these variables in the file StudentPerformanceData.pdf.
1. Using the information (summary and graphs) in Section 1.1, comment on the variables G3,
SEX, and WALC.
Section 1.2 contains an analysis of the first 10 grade observations.
Comment on the distribution of these 10 observations given Figures 3 and 4.
What is a 95% confidence interval for the mean of the 10 observations?
Using the 10 observations, perform a test of the hypothesis that the mean grade is 10. Include a concluding sentence which could be incorporated into a report about this data to your boss. Hint: the required computed quantities are given in this section.
We are interested in comparing the grades for boys and girls in Section 1.3.
Comment on Figure 5.
Assuming that the variances for the boys and girls are equal, perform a test of the hy-pothesis that the mean grade for boys is the same as the mean grade for girls. Include a concluding sentence which could be incorporated into a report about this data to your boss.
Without assuming that the variances for the boys and girls are equal, perform a test of the hypothesis that the mean grade for boys is the same as the mean grade for girls. Include a concluding sentence which could be incorporated into a report about this data to your boss.
Were your conclusions for the two tests above the same? Why do you suppose that is?
In Section 1.4, we are interested in whether there is a relationship between student grades and WALC, weekend alcohol consumption.
Comment on Figure 6 and 7.
The results from fitting a straight line model to the grades as a function of weekend alcohol consumption are given on page 10. What is the estimated straight line model for this data?
Perform a test of the hypothesis that the slope parameter is zero. Include a concluding sentence which could be incorporated into a report about this data to your boss.
Figure 8 is a qqplot of the residuals from the straight line model fit. Comment on this plot.
Figure 10 contains the side-by-side boxplots of the grades by WALC, with the fitted line drawn on top. Comment on this graph.
BONUS QUESTIONS:
(BONUS) Suppose that Y1, Y2, ..., Yn are independent N(α, σ2). Show that if σ is unknown, the likelihood ratio statistic for testing H0 : α = α0 is given by:
D = n ln
1 +
1
T 2
, where
n
−
1
αˆ − α0
T = s/√n .
2. (BONUS) Testing equality of variances. Consider k independent normal samples of sizes n1, n2, ..., nk. Measurements from sample i have unknown variance σi2. Let s21, s22, ..., s2k be the sample variances computed from the sample data which are estimates of σ12, σ22, ...σk2. Since the measurements are normally distributed, we know that.
(ni − 1)s2i/σi2 ∼ χ2(nI−1) for i = 1, 2, ..., k.
Using the above distribution, the log likelihood for σi is therefore:
ℓ(σi) = −(ni − 1) ln σi − (ni − 1)s2i/(2σi2).
(a) Find the joint log likelihood function of σ1, σ2, ..., σk and show that it is maximized for
ˆ2 2
σi = si , i = 1, 2, ..., k.
(b) Show that if σ1 = σ2 = ... = σk = σ, then the MLE of σ2 is given by,
spooled2 =
i=1 (ni − 1)si2
!
/
i=1 (ni − 1)
!
.
k
k
X
X
(c) Show that the likelihood ratio statistic for testing H0 : σ1 = σ2 = ... = σk = σ is given by
k
X
D = (ni − 1) ln(s2pooled/s2i)
i=1
.
2