Starting from:
$30

$24

STAT : Homework 4 Solved


Exercise 1: I want to survey a group of students on how much time they are spending on homework and studying. I have a frame of students that I can contact, along with their age, gender, major and year in college. Do you think it would be worth stratifying on one of these variables? Why or why not? If you were to use a variable to make strata, which would you pick and why?

Solution:

I think it is absolutely worth stratifying on one if not two of these variables. Firstly, con-sidering a students major, it’s easy to see that there is significant disparity when it comes to study time. Furthermore there is also likely significant disparity in study time when com-paring upper and lower class men, as classes tend to scale up in di culty/required study time. Therefore a SRS that contains lower and upper class men, rigorous and non-rigorous majors will almost assuredly have a higher variance with respect to study time than if you had stratified.









Exercise 2: In the Alaska Department of Fish and Game paper (included with this

HW) they use either stratification or post-stratification. Which one did they use?

What were the strata? Why did they do this?

Solution:

In the 2002 ADF&G survey for Chinook Salmon we can see that the primary sampling method was regular stratification with some secondary use of post-stratification. In the sec-tion of the paper it is stated that the area of interest was split into three separate stratum, the lower, middle, and upper portions of the Kuskokwim River. The paper states that the rea-son for dividing the area into these stratum is because of ”di ering proportions in gear type usage”. Beyond that each SRS in the stratum was designed as an ”opportunistic” sample i.e. samples were taken across time with a variety of gear with the assumption that samples would still be unbiased and independent. It is also stated, later in the sample design section that the SRS in each stratum would be ”post stratify(ed) by time and gear”. Under an ”op-portunistic” sampling scheme this, further post-stratification seems necessary to account for variance across sampling time and gear. The details of the post-stratification is defined in the second paragraph of the ”Data Processing, Analysis, and Reporting” section.
STAT 402: Homework 4



Exercise 3:    We really wish to estimate the total expenditures for fuel in a city with

    • = 80000 households. We use random phone dialing to find a SRS of n = 400 house-holds. However, we can divide the city into three strata, which we think will include houses with low fuel expense (first stratum), medium fuel expense (second stratum) and high fuel expense (third stratum) based on the typical temperatures and ages of houses in the three regions. We know the size of each stratum (N1 = 20000, N2 = 30000, N3 = 30000). However, we can’t take a SRS of each stratum since phone numbers are only roughly re-lated to address (and we are using random phone dialing).

However, we can sort the n = 400 sampled households into three samples, one from each stratum. We get the following:

    • Stratum One: x1 = $2500:00, s1 = $500:00, n1 = 120.

    • Stratum Two: x2 = $4000:00, s2 = $750:00, n2 = 150.

    • Stratum Three: x3 = $5500:00, s3 = $750:00, n3 = 130.

        a. What type of sampling is this?


Solution:

This is an example of post-stratification and it’s when you divide the large SRS of size 400 into strata after you collect the samples.



        b. Why is this better than just ignoring the strata and considering this to be an old, bor-ing SRS of size n = 400 which we did in the beginning of the course?

Solution:

This is better for the same reason a regular stratified sampling schema is better. Recall the formula for calculating sample variance,


2  =
P xi
x 2
:
S

(n
1¯)












Note that having large disparity between samples, by definition will give us more variance. By grouping the data into strata we are able to reduce the variance in our estimator for the mean or total.

    c. Find a 95 percent confidence interval for the true mean fuel expenditure in the city. What is a 95 percent confidence interval.

Solution:

Code:



2
STAT 402: Homework 4




>
S t r a t a


m e a n s <  c ( 2 5 0 0 , 4 0 0 0 , 5 5 0 0 )

>
S t r a t a


S S q u a r e d <  c ( 5 0 0 , 7 5 0 , 7 5 0 )

> S t r a t a



n
<
c ( 1 2 0 , 1 5 0 , 1 3 0 )












>  S t r a t a



N   <  c ( 2 0 0 0 0 , 3 0 0 0 0 , 3 0 0 0 0 )












>  S t r a t a

P r o p o r t i o n = S t r a t a  n / 4 0 0








































> M e a n


e s t i m a t o r = sum ( S t r a t a

P r o p o r t i o n S t r a t a

m e a n s )
[1]
4037.5






























> M a r g i n



e r r o r = 2
s q r t ( sum ( S t r a t a  P r o p o r t i o n ˆ 2
























S t r a t a  n ) / S t r a t a n )














( ( S t r a t a N















































( S t r a t a

S S q u a r e d / S t r a t a

n ) ) )
> CI95 <



c ( M e a n
e s t i m a t o r + M a r g i n



e r r o r ,

























M e a n

e s t i m a t o r  M a r g i n



e r r o r )
[1]
4074.49

4000.51


























Finally we get ˆ = 4037:5 and a 95 percent confidence interval of (4074:49; 4000:51). A 95 percent confidence interval means that there is 95% chance the true mean is contained in the interval.

    d. Suppose that we complete the study and decide, before beginning the analysis, to just analyze it without sorting into strata. Would this be valid? Why or why not?

Solution:

This would be a valid technique. Usually when you have to use post-stratification the sampling schema is already set up for a ’one big’ SRS analysis. However the point of post-stratification is that we have information about how the data can be stratified and ignoring that information means leaving accuracy on the table.




















3
STAT 402: Homework 4



Exercise 4.: We want to estimate the concentration of available nitrogen in the soils of a region. Cold, wet soils generally have more available nitrogen, but also soils with nitrogen fixing (alder) plants or areas showing high productivity might have higher soil nitrogen. We think we can very easily classify plots of ground into either low or high nitrogen plots just by looking at them, but the actual soil sampling and analysis is expensive.


To lower cost, we’ll do the following:


    (1) divide the region into N = 20000 reasonably-sized plots.

    (2) take a SRS of size m = 500 plots which we will visit and rapidly classify into either high N stratum or low N stratum (actually this would probably be done as a systematic sample, not an SRS, which we’ll see later).
    (3) We find that m1 = 300 of these plots are classified as low nitrogen and m2 = 200 plots as high nitrogen.
    (4) Now we take an SRS of size n1 = 30 from the low nitrogen (we hope) plots and, inde-pendently, n2 = 50 from the high nitrogen plots. We get the following:


Classified as low nitrogen: x1 = 30ppm, s1 = 10ppm

Classified as high nitrogen: x2 = 40ppm, s2 = 15ppm.

    a. Does it appear that the stratification will help us much? If we decide it doesn’t, can we just pretend we took a SRS of size 500 and ignore the stratification?

    b. Find a 95 percent confidence interval for the average nitrogen concentration.


Solution:


















Code:


















> S t r a t a m e a n s <  c ( 3 0 , 4 0 )








> S t r a t a





S S q u a r e d < c ( 1 0 , 1 5 )








> S t r a t a





n <  c ( 3 0 , 5 0 )








> S t r a t a



m <  c ( 3 0 0 , 2 0 0 )








>
S t r a t a



P r o p o r t i o n = S t r a t a m / sum ( S t r a t a

m )
>




s t i m a t o r
= sum ( S t r a t


















M e a n e


a  P r o p o r t i o n S t r a t a  m e a n s )
[1]
34






























































>
M a r g i n

e r r o r = 2
s q r t ( sum ( S t r a t a

P r o p o r t i o n ˆ 2
[1]
2.212691


( ( S t r a
t a  m

S t r a t a n ) / S t r a t a n )
















( S t r a t a S
S q u a r e d / S t r a t a  n ) ) )




































4
STAT 402: Homework 4




> CI95 <
c ( M e a n

e s t i m a t o r + M a r g i n

e r r o r ,

M e a n

e s t i m a t o r   M a r g i n

e r r o r )
[1]
36.21269
31.78731






I do think that had we taken a SRS of size 500 we would have a higher variance than just using stratification. Beyond that, because of how the samples were stratified before they were collected we cannot redo the analysis as a large SRS, since the selection of the samples is no longer random.




















































5

More products