$29
Problem 1 [25%]
It is mentioned in Chapter 7 of ISL that a cubic regression spline with one knot at ξ can be obtained using a basis of the form x, x2, x3, [x − ξ]3+, where [x − ξ]3+ = (x − ξ)3 if x > ξ and equals 0 otherwise. We will now show that a function of the form
f(x) = β0 + β1x + β2x2 + β3x3 + β4[x − ξ]3+
is indeed a cubic regression spline, regardless of the values of β0,β1,β2, β3,β4.
1. Find a cubic polynomial
f1(x) = a1 + b1x + c1x2 + d1x3
such that f(x) = f1(x) for all x ≤ ξ. Express a1,b1,c1,d1 in terms of β0,β1,β2,β3,β4.
2. Find a cubic polynomial
f2(x) = a2 + b2x + c2x2 + d2x3
such that f(x) = f2(x) for all x > ξ. Express a2,b2,c2,d2 in terms of β0,β1,β2,β3,β4. We have now established that f(x) is a piecewise polynomial.
3. Show that f1(ξ) = f2(ξ). That is, f(x) is continuous at ξ.
Problem 2 [25%]
Use linear, cubic, and natural regression splines investigated Chapter 7 of ISL to the Auto data set. Is there evidence for non-linear relationships in this data set? Create some informative plots to justify your answer.
Problem 3 [25%]
You will now derive the Bayesian connection to the lasso as discussed in Section 6.2.2. of ISL.
Pp
1. Suppose that yi = β0 + j=1 xij βj from a normal distribution N(0, 1).
• i where 1, . . . , n are independent and identically distributed Write out the likelihood for the data as a function of values β.
2. Assume that the prior for β : β1, . . . , βp is that they are independent and identically distributed according to a Laplace distribution with mean zero and variance c. Write out the posterior for β in this setting using Bayes theorem.
3. Argue that the lasso estimate is the value of β with maximal probability under this posterior distribution. Compute log of the probability in order to make this point. Hint: The denominator (= the probability of data) can be ignored in computing the maximum probability.
4. Suppose that 1, . . . , n are independent and identically distributed according to the Laplace distribution. What are the maximum likelihood/MAP estimates of βi under this assumption? Hint: See https: //en.wikipedia.org/wiki/Least_absolute_deviations
1
Problem 4 [25%]
Based on a true story, according to: The Drunkard’s Walk: How Randomness Rules Our Lives, Leonard Mlodinow
Suppose that you applied for a life insurance and underwent a physical exam. The bad news is that your application was rejected because you tested positive for HIV. The test’s sensitivity is 99.7% and specificity is 98.5% [https://en.wikipedia.org/wiki/Diagnosis_of_HIV/AIDS#Accuracy_of_HIV_testing]. However, after studying the CDC website, you find that in your ethnic group (age, gender, race, . . . ) only one in 10,000 people is infected. What is the probability that you actually have HIV?
2