$24
*Please be sure to submit your assignment by 11:55ish pm (or before) to prevent any glitches in the upload from precluding your timely submission.
*Please work well in advance, getting help during office hours and labs, as there will be no extensions given for this assignment, outside of extreme, extenuating circumstances which must be communicated in advance to the primary instructor.
There is 1 problem with various parts (1a -1g), in this homework assignment. Please double check that you have provided a response for each part of the problem, before you submit.
BST 210 Problem set policies:
We encourage you to discuss homework with your fellow students (or with the instructor or the TAs), but you must write your own final answers, in your own words.
Please include the appropriate computer output in your solution if that helps you to answer a question, but be sure to interpret your findings in words – submitting only output is not sufficient for full credit.
Homework assignments will not be accepted late (other than for extreme emergency, but the primary instructor must be reached in advance).
Be complete in your responses; not verbose, to get full scores.
All homework must be submitted online via Canvas by 11:59pm on Tuesday.
Problem 1
Suppose you wish to design a prospective cohort study assessing whether obesity (BMI ≥ 30) is associated with presence or absence of coronary heart disease (CHD) or time to CHD. Because we don’t want to wait 24 years to collect our data, a consistent four-year follow-up is planned for each subject. Future subjects are expected to look similar to those in the Framingham study.
A pilot study of 250 Framingham-like subjects is run, excluding subjects who already had prevalent CHD. First, we look at presence or absence of obesity to predict a binary four-year CHD incidence. Subjects who died within four years without having a prior CHD are viewed as not developing CHD. Among the 250 subjects, 3 of 31 obese (BMI ≥ 30) subjects developed CHD within 4 years, and 11 of 219 non-obese subjects developed CHD within 4 years. Using logistic regression, the estimated odds ratio comparing obese vs. non-obese subjects is 2.025974 with 95% confidence interval (0.5325236, 7.707772).
Next, we look at presence or absence of obesity to predict time to CHD, with right censoring occurring at four years for subjects who did not have CHD by that time. Subjects who died without having a prior CHD were viewed as being censored at their time of death. Among these same 250 subjects, the estimated hazard ratio from the proportional hazards model comparing obese vs. non-obese subjects is 1.990321 with 95% confidence interval (0.5552206, 7.134780). Not surprisingly, given only 250 subjects, we do not reach
statistical significance with either analysis. View these data as informative historical data (and ignore the analysis of the full Framingham data farther below for now).
Determine the sample size needed for 90% power in a two-sided 0.05 level test to compare proportions of incident CHD over four years in obese vs. non-obese subjects if, under the alternative hypothesis, we had proportions with incident CHD as observed in the 250 subjects. Keep the proportions of obese and non-obese subjects the same as observed in the pilot study.
Determine the sample size needed for 90% power in a two-sided 0.05 level log-rank test to compare times to CHD in obese vs. non-obese subjects if, under the alternative hypothesis, we have a hazard ratio as observed in the 250 subjects. Keep the proportions of obese vs. non-obese subjects the same as observed, as well as the proportion of censored observations.
How do the sample sizes change if we design each of the studies above to have an equal number of obese vs. non-obese subjects? Is the total sample size larger or smaller? Does that make intuitive sense? Briefly comment on the feasibility of such a design.
If you had to pick “one primary outcome” for your study, would you prefer to design this study to have a binary or a time-to-event outcome? Briefly justify your choice.
In practice, one would use more “rounded” values for OR’s, HR’s, or proportions of obese or censored observations than was done above, as you would not exactly believe the estimates from the sample size of 250. It would also be important to perform a range of calculations to show power or sample size under different scenarios. Using your “one primary outcome” selected above, develop an appropriate sample size based on your parameter of interest (either OR or HR) of 2.0 for 90% power for a two-sided 0.05 level test. Then, develop a table and/or graph for a range of 1.5 to 2.5 in increments of 0.1 for your parameter of interest, for a fixed sample size to show changes in power resulting under different scenarios. Also include a sentence or two summarizing your results that would be appropriate to include in a protocol or grant application. Feel free to add or adjust anything here that seems reasonable to you – different reasonable researchers might use different assumptions and so would end up with different sample sizes, so be sure that your sentences summarizing your results include all necessary information about your assumptions.
Analyses from the “full” Framingham data set are below, restricting follow-up to four years to develop CHD for each subject and eliminating subjects with incident CHD or missing BMI. How close were your assumptions in your sample size calculations to the “truth” from the full data set? Were the sample sizes that were actually achieved sufficient for high power? Briefly comment.
A colleague noted that the relative risk, the odds ratio, and the hazard ratio estimates in these analyses were similar. Also, that logistic regression, the log rank test, and the Cox model gave similar P-values. Briefly discuss whether these observations make sense or not.
(Note that even more work would be needed if we also wanted to account for other covariates or confounders, such as adjusting for gender or age, in our comparisons of obese vs. non-obese subjects.)
Below are analysis results of the complete Framingham dataset, with a binary outcome of chd4 for whether or not CHD occurred within four years and a survival outcome of timechd4 for time to CHD with observations being appropriately censored at four years (or earlier, if death without CHD occurred earlier), by obese status (= 1 for obese, = 0 for non-obese). These analyses eliminate subjects with incident CHD or who have missing BMI values.
. cs chd4 obese,
or
| obese
Unexposed
|
Total
|
Exposed
|
-----------------Cases
+------------------------
39
168
+
------------207
|
|
Noncases
|
500
3514
|
4014
-----------------Total
+------------------------
539
3682
+
------------4221
|
|
Risk
|
.0723562
.0456274
|
.0490405
|
|
|
Point estimate
|
[95% Conf. Interval]
|
|
Risk difference
|------------------------
.0267288
+
------------------------.0038421
.0496156
|
|
Risk ratio
|
1.585807
|
1.132751
2.220067
(Cornfield)
Odds ratio
|
1.6315
|
1.139401
2.336332
+-------------------------------------------------
chi2(1) =
7.20 Prchi2 = 0.0073
. logistic chd4 obese
Logistic regression
Number of obs
=
4221
LR chi2(1)
=
6.46
Log likelihood =
-822.73894
Prob chi2
=
0.0111
Pseudo R2
=
0.0039
------------------------------------------------------------------------------
chd4 | Odds Ratio
Std. Err.
z
P|z|
[95% Conf. Interval]
-------------obese
+
----------------------------------------------------------------1.6315
.3002934
2.66
0.008
1.137405
2.340232
|
_cons
|
.0478088
.0037757
-38.50
0.000
.0409529
.0558124
------------------------------------------------------------------------------
. sts graph, by(obese) risktable
failure _d: chd4
analysis time _t: timechd4
Kaplan-Meier survival estimates
001.
750.
500.
250.
000.
0
1
2
3
4
Number at risk
analysis time
obese = 0 3682
3640
3610
3564
3514
obese = 1 539
525
519
508
500
obese = 0
obese = 1
. sts test obese
failure _d: chd4
analysis time _t: timechd4
Log-rank test for equality of survivor functions
|
Events
Events
obese |
observed
expected
0------
+
-------------------------168
181.02
|
1
|
39
25.98
------
+
-------------------------
Total |
207
207.00
chi2(1) =
7.45
Prchi2 =
0.0063
. stcox obese
failure _d: chd4
analysis time _t: timechd4
Cox regression -- Breslow method for ties
No. of subjects =
4221
Number of obs
=
4221
No. of failures
=
207
Time at risk
=
16483.55373
LR chi2(1)
=
6.60
Log likelihood
=
-1719.5688
Prob chi2
=
0.0102
------------------------------------------------------------------------------
_t
| Haz. Ratio
Std. Err.
z
P|z|
[95% Conf. Interval]
-------------obese
+
----------------------------------------------------------------1.617147
.2874419
2.70
0.007
1.141436
2.291118
|
------------------------------------------------------------------------------