Starting from:
$30

$24

Homework 1, Generalized linear models Solution




1 Flies




This is Ex. 6.81 from Faraway (2005). One hundred twenty-five fruit flies were divided randomly into five groups of 25 each. The response was the lifetime of the fruit fly in days. One group was kept solitary, while another was kept individually with a virgin female each day. Another group was given eight virgin females per day. As an additional control the fourth and fifth groups were kept with one or eight pregnant females per day (pregnant fruit flies will not mate). The thorax length of each male was measured as this was known to affect lifetime. The data is fruit fly in the library faraway. A complete reference to the data is given in the help file for the dataset.




data('fruitfly', package='faraway')




summary(fruitfly)




##


thorax
longevity


activity
##
Min.
:0.6400
Min.
:16.00
isolated:25
##
1st Qu.:0.7600
1st Qu.:46.00
one
:25
##
Median :0.8400
Median :58.00
low
:25
##
Mean
:0.8224
Mean
:57.62
many
:24
##
3rd Qu.:0.8800
3rd Qu.:70.00
high
:25
##
Max.
:0.9400
Max.
:97.00







Use a Gamma generalized linear model to model the lifetimes as a function of the thorax length and activity. Write a brief report (a half to one page of writing) summarizing the problem and the model used, and interpreting the coefficients in your model in terms of their effect on expected lifetime. Write a one-paragraph, non-technical, summary of the results, that might appear in a “Research News” media article about the laboratory in question.




Hints




consider centering and rescaling variables



don’t show R code in your answer but putting your code in an appendix might help the marker



format tables and figures nicely



The code below does not fit a useful model, but it might help you get started



glm(thorax ~ longevity + activity, family=Gamma(), data=fruitfly)







2 Smoking




Over the course of the next 13 weeks you will be using the 2014 American National Youth Tobacco Survey to become an expert in all matters pertaining to the use of cigars, hookahs, and chewing tobacco amongst







1
American school children. MS Access and SAS versions of the survey data are available from the Sur-vey’s web page. On the pbrown.ca/appliedstats/astwo/data page there is an R version of the 2014 dataset smoke.RData, a pdf documentation file 2014-Codebook.pdf, and the code used to create the R version of the data smokingData.R.




The research hypotheses to be investigated using this survey are as follows.




Regular use of chewing tobacco, snuff or dip is no more common amongst Americans of European ancestry than for Hispanic-Americans and African-Americans, once one accounts for the fact that white Americans more likely to live in rural areas and chewing tobacco is a rural phenomenon.
The likelihood of having used a hookah or waterpipe on at least one occasion is the same for two individuals of the different sexes, provided their age, ethnicity, and other demographic characteristics are similar.



Write a short consulting report addressing these hypotheses. This should include the following:




a one-paragraph summary stating your conclusions, which could be understood by a child health and welfare professional or an executive in the marketing department of a large tobacco firm;



a writeup of roughly one page of text (not including figures and tables) containing



– an introduction restating the problem as you’ve interpreted it in relation to this dataset,




– a methods section giving the statistical models used (in mathematical notation, not R syntax) and justifying their use, and

– a results section where the results are described and interpreted; and




an appendix containing your code.



The report will be assessed in terms of:




clarity of presentation,



the use of an appropriate model and implementing it correctly,



demonstration of an understanding of the statistical models used, and



drawing conclusions which are consistent with the analysis.



The data




You can obtain the data with:




dataDir = "../data"




smokeFile = file.path(dataDir, "smokeDownload.RData")




if (!file.exists(smokeFile)) {




download.file("http://pbrown.ca/teaching/appliedstats/data/smoke.RData", smokeFile)




}




(load(smokeFile))




## [1] "smoke" "smokeFormats"




The smoke object is a data.frame containing the data, the smokeFormats gives some explanation of the vari-ables. The colName and label columns of smokeFormats contain variable names in smoke and descriptions respectively.




chewing_tobacco_snuff_or: RECODE: Used chewing tobacco, snuff, or dip on 1 or more days in the past 30 days



ever_tobacco_hookah_or_wa: RECODE: Ever smoked tobacco out of a hookah or waterpipe



The data produced by smokingData.R has changed the data in a few ways.




• RuralUrban is a flag denoting whether the school the respondent attended was rural or urban.







2
Race is an R factor recoded from RaceEth_no_mult_grp.



ages have been converted to years from the original categorical variables described in the pdf file



Some words of advice




Write in sentences and paragraphs.



Provide captions for ALL figures and tables



Don’t use default axis labels on plots and ensure text on plots is large enough to read comfortably



Round numbers to 2 or 3 decimal places so tables look tidy.



Don’t show raw R output. Put things in Latex or Markdown tables (using knitr::kable or
Hmisc::latex)

Give parameter estimates and confidence intervals on the ‘natural’ scale where possible (probabilities or odds rather than log-odds ratios)



Hints




get rid of 9 year olds because their data is suspicious




smokeSub = smoke[smoke$Age = 10, ]




fit a model incapable of answering the research question




glm(ever_tobacco_pipe_not_hoo ~ RuralUrban + Race + Age, family=binomial, data=smokeSub)




##




Call: glm(formula = ever_tobacco_pipe_not_hoo ~ RuralUrban + Race +



Age, family = binomial, data = smokeSub)



##








## Coefficients:






##
(Intercept)
RuralUrbanRural
Raceblack
Racehispanic
##
-8.72748
0.20722
-1.23664
-0.15502
##
Raceasian
Racenative
Racepacific
Age
##
-0.97685
-0.04943
-0.51884
0.35989
##








Degrees of Freedom: 20027 Total (i.e. Null); 20020 Residual



(1939 observations deleted due to missingness)



Null Deviance:5884



Residual Deviance: 5468 AIC: 5484



Looks like white kids smoke pipes more than anyone else.







References




Faraway, J.J. (2005). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonpara-metric Regression Models. Chapman & Hall/CRC Texts in Statistical Science. CRC Press. url: http : //www.tandfebooks.com/isbn/9780203492284.




























3

More products