Intro to Big Data Science: Assignment 3 Solution

Starting from:

~~$35~~

$29

Home

Exercise 1

Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.

Exercise 2

We have a dataset with n records in which the i -th record has one real-valued input attribute xi and one real-valued output response yi . We have the following model with one unknown parameter w which we want to learn from data: yi » N (ew xi , 1). Note that the variance is known and equal to one.

1. Is the task of estimating w a linear regression problem or a non-linear regression problem?

2. Suppose you decide to do a maximum likelihood estimation of w. You do the math and figure out that you need w to satisfy one of the following equations. Which one?
(B)
n

w xi

n

w xi

Pn
xi e2w xi

P
n

xi yi ew xi
(A)
i
˘1 xi e

˘
i ˘1 xi yi e

(C)
Pn˘
x2ew xi

˘ Pn ˘
xi yi ew xi

i
1
xi2ew xi

˘

i
1

(D)
Pn˘

Pn˘
1
xi yi ew xi /2

i
1

i

(E)
Pn˘
ew xi

nP
yi ew xi

i
1
i

˘
i
˘1

Pi
˘1

˘
Pi ˘1

Exercise 3 (Linear regression as a projection)

Consider a multivariate liner model y ˘ Xw ¯† with y 2 Rn£1, X 2 Rn£(d¯1), w 2 R(d¯1)£1, and † 2 Rn£1, where † » N (0, ¾2I), follows the normal distribution.

1

1. Show that the linear regression predictor is given by yˆ ˘ X(XT X)¡1XT y.

2. Let P ˘ X(XT X)¡1XT , show that P has only 0 and 1 eigenvalues.

3. Show that wˆ ˘ (XT X)¡1XT y is an unbiased estimator of w, i.e., E(wˆ) ˘ w. Also show that Var(wˆ) ˘ (XT X)¡1¾2. (Note that by definition, Var(wˆ) ˘ E[(wˆ¡E(wˆ))(wˆ¡ E(wˆ))T ]).
4. Recall the definition of R2
score: R2
:˘ 1
¡
SSr es
, where SSt ot
˘ P
n
(yi ¡ y¯)2,

SSt ot

i ˘1

SS r eg ˘ Pni˘1(yˆi ¡ y¯)2, and SSr es ˘ Pni˘1(yi ¡ yˆi )2. Prove that for linear regression,
SSt ot ˘ SSr eg ¯SSr es . (So that R2 score can also be defined as R2
˘
SSr eg
)

SSt ot

Exercise 4 (Generalized Cross-Validation) Consider ridge regression:

• i
min (y
¡
Xw)T (y
¡
Xw)
¯
‚
w 2
w

k
k2
It has the solution wˆ ˘ (XT X ¯‚I)¡1XT y and prediction yˆ ˘ X(XT X ¯‚I)¡1XT y ˘ Py with P ˘ X(XT X ¯‚I)¡1XT be the projection matrix.
1. Define the leave-one-out cross validation estimator as
w
[k]
˘ arg w
h
n

w)
2
¯

kwk2
i.

i ˘1,i 6˘k(yi ¡xi

‚

ˆ

min

X
T

2

Show that wˆ[k] ˘ (XT X ¯‚I ¡xk xTk )¡1(XT y ¡xk yk )

squared error as V
(‚)

1

n

T ˆ
[k]

2. Define the ordinary cross-validation (OCV) mean

0

˘

kP˘1
(xk w

¡

n
yˆk
¡
yk 2

n

2

1
k˘1 ‡

·

, where yˆk

n

yk )
. Show that V0(‚) can be rewritten as V0(‚) ˘
n

1¡pkk

˘
j ˘1 pk j y j

and pk j is the (k, j )-entry of P.

P

P

(Hint: You may need to use the Sherman-Morrison Formula for nonsingualar ma-

trix A and vectors x and y with yT A¡1x
1: (A

xyT )¡1

A¡1

A¡1xyT A¡1
·

yT A 1x

k ˘ ‡
1 1 ¡pkk

2

6˘ ¡
¯

˘

¡ 1¯n

¡

3. Define weights as w

and weighted OCV as V (‚)

1

w

(xT wˆ[k]

˘ n k˘1

k

¡

n tr (I¡P) ·

k

P

• k )2. Show that V (‚) can be written as

• k(I ¡A)yk2
V (‚) ˘ n

◦ i2
1 ¡ t r (P)/n

Exercise 5 (Solving LASSO by ADMM) The alternating direction method of multipliers (ADMM) is a very useful algorithm for solving the constrained optimization problem:

min f (µ) ¯ g (z), subject to Aµ ¯Bz ˘ c.
• ,z

The algorithm is given by using Lagrange multiplier u for the constraint. The detail is as follows:

2

1. µ(k¯1) ˘ arg min L(µ, z(k), u(k));
µ

2. z(k¯1) ˘ arg min L(µ(k¯1), z, u(k));
z
3. u(k¯1) ˘ u(k) ¯Aµ(k¯1) ¯Bz(k¯1) ¡c;

where L is the augmented Lagrange function defined as

L(µ, z, u) ˘ f (µ) ¯ g (z) ¯uT (Aµ ¯Bz ¡c) ¯ 1 kAµ ¯Bz ¡ck2.

2 2

An advantage of ADMM is that no tuning parameter such as the step size in the gradient algorithm is involved. Please write down the ADMM steps for solving LASSO problem:
• i
min ky ¡Xwk2 ¯‚kwk1 .
w 2

(Hint: In order to use ADMM, you have to introduce an auxiliary variable and a suit-able constraint. Please give the explicit formulae by solving “arg min” in each step of ADMM.)

3