$29
Exercise 1
Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.
Exercise 2
We have a dataset with n records in which the i -th record has one real-valued input attribute xi and one real-valued output response yi . We have the following model with one unknown parameter w which we want to learn from data: yi » N (ew xi , 1). Note that the variance is known and equal to one.
1. Is the task of estimating w a linear regression problem or a non-linear regression problem?
2. Suppose you decide to do a maximum likelihood estimation of w. You do the math and figure out that you need w to satisfy one of the following equations. Which one?
(B)
n
w xi
n
w xi
Pn
xi e2w xi
P
n
xi yi ew xi
(A)
i
˘1 xi e
˘
i ˘1 xi yi e
(C)
Pn˘
x2ew xi
˘ Pn ˘
xi yi ew xi
i
1
xi2ew xi
˘
i
1
(D)
Pn˘
Pn˘
1
xi yi ew xi /2
i
1
i
(E)
Pn˘
ew xi
nP
yi ew xi
i
1
i
˘
i
˘1
Pi
˘1
˘
Pi ˘1
Exercise 3 (Linear regression as a projection)
Consider a multivariate liner model y ˘ Xw ¯† with y 2 Rn£1, X 2 Rn£(d¯1), w 2 R(d¯1)£1, and † 2 Rn£1, where † » N (0, ¾2I), follows the normal distribution.
1
1. Show that the linear regression predictor is given by yˆ ˘ X(XT X)¡1XT y.
2. Let P ˘ X(XT X)¡1XT , show that P has only 0 and 1 eigenvalues.
3. Show that wˆ ˘ (XT X)¡1XT y is an unbiased estimator of w, i.e., E(wˆ) ˘ w. Also show that Var(wˆ) ˘ (XT X)¡1¾2. (Note that by definition, Var(wˆ) ˘ E[(wˆ¡E(wˆ))(wˆ¡ E(wˆ))T ]).
4. Recall the definition of R2
score: R2
:˘ 1
¡
SSr es
, where SSt ot
˘ P
n
(yi ¡ y¯)2,
SSt ot
i ˘1
SS r eg ˘ Pni˘1(yˆi ¡ y¯)2, and SSr es ˘ Pni˘1(yi ¡ yˆi )2. Prove that for linear regression,
SSt ot ˘ SSr eg ¯SSr es . (So that R2 score can also be defined as R2
˘
SSr eg
)
SSt ot
Exercise 4 (Generalized Cross-Validation) Consider ridge regression:
• i
min (y
¡
Xw)T (y
¡
Xw)
¯
‚
w 2
w
k
k2
It has the solution wˆ ˘ (XT X ¯‚I)¡1XT y and prediction yˆ ˘ X(XT X ¯‚I)¡1XT y ˘ Py with P ˘ X(XT X ¯‚I)¡1XT be the projection matrix.
1. Define the leave-one-out cross validation estimator as
w
[k]
˘ arg w
h
n
w)
2
¯
kwk2
i.
i ˘1,i 6˘k(yi ¡xi
‚
ˆ
min
X
T
2
Show that wˆ[k] ˘ (XT X ¯‚I ¡xk xTk )¡1(XT y ¡xk yk )
squared error as V
(‚)
1
n
T ˆ
[k]
2. Define the ordinary cross-validation (OCV) mean
0
˘
kP˘1
(xk w
¡
n
yˆk
¡
yk 2
n
2
1
k˘1 ‡
·
, where yˆk
n
yk )
. Show that V0(‚) can be rewritten as V0(‚) ˘
n
1¡pkk
˘
j ˘1 pk j y j
and pk j is the (k, j )-entry of P.
P
P
(Hint: You may need to use the Sherman-Morrison Formula for nonsingualar ma-
trix A and vectors x and y with yT A¡1x
1: (A
xyT )¡1
A¡1
A¡1xyT A¡1
·
yT A 1x
k ˘ ‡
1 1 ¡pkk
2
6˘ ¡
¯
˘
¡ 1¯n
¡
3. Define weights as w
and weighted OCV as V (‚)
1
w
(xT wˆ[k]
˘ n k˘1
k
¡
n tr (I¡P) ·
k
P
• k )2. Show that V (‚) can be written as
• k(I ¡A)yk2
V (‚) ˘ n
◦ i2
1 ¡ t r (P)/n
Exercise 5 (Solving LASSO by ADMM) The alternating direction method of multipliers (ADMM) is a very useful algorithm for solving the constrained optimization problem:
min f (µ) ¯ g (z), subject to Aµ ¯Bz ˘ c.
• ,z
The algorithm is given by using Lagrange multiplier u for the constraint. The detail is as follows:
2
1. µ(k¯1) ˘ arg min L(µ, z(k), u(k));
µ
2. z(k¯1) ˘ arg min L(µ(k¯1), z, u(k));
z
3. u(k¯1) ˘ u(k) ¯Aµ(k¯1) ¯Bz(k¯1) ¡c;
where L is the augmented Lagrange function defined as
L(µ, z, u) ˘ f (µ) ¯ g (z) ¯uT (Aµ ¯Bz ¡c) ¯ 1 kAµ ¯Bz ¡ck2.
2 2
An advantage of ADMM is that no tuning parameter such as the step size in the gradient algorithm is involved. Please write down the ADMM steps for solving LASSO problem:
• i
min ky ¡Xwk2 ¯‚kwk1 .
w 2
(Hint: In order to use ADMM, you have to introduce an auxiliary variable and a suit-able constraint. Please give the explicit formulae by solving “arg min” in each step of ADMM.)
3