$29
1. (30 marks) Consider the 3 pairs of side-by-side box plots given below, which were drawn in R. The data for this question is in the le \juries.csv".
(a) (10 marks) Recreate the side-by-side box plots and include the last four digits of your student number in the title. For each pair of box plots, provide the single line of R code, beginning with the R function- boxplot used to recreate it.
(b) (15 marks) For each box plot specify the values of the following - the rst quartile, the second quartile, the third quartile, the end points of the two whiskers, the extreme (outlier) points (identi ed as small circles), if any, and the limits of the 1.5IQR Rule Range.
(c) (5 marks) Compare the three pairs of box plots. Which pair best represents the data and why?
2. (60 marks) Consider the data, \bbw99.csv" based on the birth weights of 99 babies, along with the smoking status of their mothers. Answer the questions that follow. The variables in the dataset are:
id- an identi cation number from 1 to 99 bwt- baby birth weight in ounces
smoke- smoking status of mother (coded as 0, if she was a non-smoker, and 1, otherwise)
(a) (5 marks) Which variables are categorical? Name the levels of each categorical variable.
(b) (20 marks) Conduct an appropriate hypothesis test to determine whether there is a dif-ference in the mean birth weight between babies born to mothers who were smokers and babies born to mothers who were nonsmokers. Include the following:
i. Side-by-side boxplots
ii. Null and Alternative Hypotheses
iii. A test statistic and it’s distribution
2
iv. Test assumptions
v. Test diagnostics (checking model assumptions)
vi. P-value
vii. Results (brief discussion and conclusion)
(c) (5 marks) Name two(2) statistical methods which are equivalent to your method used in part (b) above.
(d) (25 marks) Create a subset of the data by removing the row of observations whose ‘id’ matches the last 2 digits of your student number . For instance, this can be done in R by
shivon:subset < shivon:data[ 1; ] if my student number ends with ‘01’.
Then redo the analyses of part (b) above with your data subset.
(e) (5 marks) Compare your results of part (b) and part (d). Do you think that the observation removed was in uential?
3