$24
Question 1 (35 points)
Suppose that you are interested in estimating the causal relationship between y and x1. For this purpose, you can collect data on a control variable, x2. (For concreteness, you might think of y as nal exam score,
~
x1 as class attendance and x2 as SAT score.) Let 1 be the simple regression estimate from y on x1 and let
^
1 be the multiple regression estimate from y on x1, controlling for x2.
~
^
1) If x1 is positively correlated with x2, and x2 has an e ect on y, would you expect 1
and 1 to be similar
~
of the simple regression estimate when we omit x2.
or di erent? Show the direction of potential bias in 1
(10 points)
~
^
2) If x1 is highly correlated with x2, would you expect se( 1) or se( 1) to be smaller (or unclear)? Discuss
^
your answer based on the formula of se( 1). (5 points)
^
3) Discuss how the se( 1) would be a ected if you add more independent variables (in addition to x1 and x2) to the model. Suppose these new additional variables have have small correlation with x1 and have high correlation with y. (5 points)
^
4) Discuss how the se( 1) would be a ected if the sample size was increased by four times (instead of: n=x,
^
the sample size will be: n=4x). Calculate the approximate expected change in se( 1). (5 points)
^
5) Discuss how the scenarios in question (2), (3) and (4) would e ect the statistical signi cance of 1 and
^
the size of con dence intervals constructed for 1. Discuss each scenario separately. (10 points)
Question 2 (40 points)
Use the data in DISCRIM.DTA to answer this question. These are zip code-level data on prices for various items at fast-food restaurants, along with characteristics of the zip code population, in New Jersey and Pennsylvania. The idea is to see whether fast-food restaurants charge higher prices in areas with a larger concentration of blacks.
1) Consider a model to explain the price of soda, psoda, in terms of the proportion of the population that is black and median income:
psoda = 0 + 1prpblck + 2income + u;
Estimate this model by OLS and report the results, including the sample size and R-squared. Interpret the sign and magnitude of the coe cients on prpblck and income. (10 points)
2) Is the coe cient on prpblck statistically signi cant (di erent from zero) at 0.05 signi cance level? What
2
is the minimum signi cance level that we can say the coe cient is signi cantly di erent from zero? Is the coe cient signi cantly di erent from "0.1" at 0.05 signi cance level? (10 points)
3) Compare the estimate from part (1) with the simple regression estimate from psoda on prpblck. Is the discrimination e ect larger or smaller when you control for income? Explain why the e ect is larger or smaller when you control for income (Hint: you are expected to use correlations in your explanation). (10 points)
4) A model with a constant price elasticity with respect to income may be more appropriate. Report estimates of the model
log(psoda) = 0 + 1prpblck + 2log(income) + u;
If prpblck increases by 0.20 (20 percentage points), what is the estimated percentage change in psoda? (5 points)
5) Now add the variable prppov to the regression in part (1). Find the correlation between income and prppov. Is it roughly what you expected? Evaluate the following statement: "Because income and prppov are so highly correlated, they have no business being in the same regression". (5 points)
Question 3 (25 points)
The following model can be used to study whether campaign expenditures a ect election outcomes:
voteA = 0 + 1log(expendA) + 2log(expendB) + 3prtystrA + u;
where voteA is the percentage of the vote received by Candidate A, expendA and expendB are campaign expenditures by Candidates A and B, and prtystrA is a measure of party strength for Candidate A (the percentage of the most recent presidential vote that went to A’s party)
1) Estimate the given model using the data in VOTE1.DTA and report the results in usual form. Interpret the sign and magnitude of the estimated e ect of A’s expenditures on the outcome ( 1). Do you think the estimated e ect is the causal e ect? What is the main assumption we make to say it is a causal e ect? Discuss the validity of that assumption by providing an example. (10 points)
2) Interpret the explanatory power (R2) of the estimated model. Is it an evidence for causal e ect of expendA on voteA? (5 points)
3) Based on the estimation results, discuss the statistical signi cance of the estimated coe cient for A’s expenditure (at 0.05 signi cance level). Explain your answer by using di erent approaches: 1) use reported t-value, 2) use reported p-value, 3) use reported con dence interval (10 points)
3