$29
Exercise 1
Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.
Exercise 2 Consider training an AdaBoost classifier using decision stumps on the five-point data set (4 “+” samples and 1 “-” sample):
1. Which examples will have their weights increased at the end of the first iteration? Circle them.
2. How many iterations will it take to achieve zero training error? Explain by doing some computation using the above algorithm.
3. Can you add one more sample to the training set so that AdaBoost will achieve zero training error in two steps? If not, explain why.
Exercise 3 (Hierarchical Clustering)
Suppose that we have four observations, for which we compute a dissimilarity matrix,
1
given by
• 1
0
0.3
0.4
0.7
C
B 0.3
0
0.5
0.8
• C
0.4 0.5 0 0.45 C
• A
0.7 0.8 0.45 0B
For instance, the dissimilarity between the first and second observations is 0.3, and the dissimilarity between the second and fourth observations is 0.8.
1. On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering these four observations using complete linkage. Be sure to indicate on the plot the height at which each fusion (merge) occurs, as well as the observations corresponding to each leaf in the dendrogram.
2. Repeat 1, this time using single linkage clustering.
3. Suppose that we cut the dendrogram obtained in 1 such that two clusters result. Which observations are in each cluster?
4. Suppose that we cut the dendrogram obtained in 2 such that two clusters result. Which observations are in each cluster?
5. It is mentioned in the chapter that at each fusion in the dendrogram, the position of the two clusters being fused can be swapped without changing the meaning of the dendrogram. Draw a dendrogram that is equivalent to the dendrogram in 1, for which two or more of the leaves are repositioned, but for which the meaning of the dendrogram is the same.
Exercise 4(Out-of-bag error of random forest approaches its leave-one-out CV error)
Consider the regression problem with data Z ˘ {(x1, y1), (x2, y2), . . . , (xN , yN )}. The boot-
ˆ
strap aggregation draws N samples from Z independently with replacement. Let P be the empirical distribution putting equal probability 1/N on each of the data points
ˆ
1
N
y).
For each group of bootstrap data Z⁄b
˘ {(x 1⁄b , y1⁄b ), (x2⁄b , y2⁄b ), . .P., (xN⁄b , yN⁄b )}, b ˘
(xi
, yi ), i.e., the cumulative distribution function of
P is
F Pˆ (x, y)
˘ N
i ˘1 I (xi É x, yi
É
1, 2, . . . , B, it follows that (x
⁄b
, y
⁄b )
»
Pˆ. We can fit a univariate linear model fˆ⁄b (x)
˘
wˆ0⁄b ¯ wˆ1⁄b x.
i
i
1. Show that the OLS fitted coefficients are
˘
P
n
(x⁄b )2
P
n y⁄b
n x⁄b y⁄b
n x⁄b
⁄b
i ˘1
i
n
i
(x⁄b )2
P(
i
x⁄b )2P
i ˘1 i
˘1
i
¡
i ˘1
i
wˆ0
n
⁄
n
x⁄
P
n
y⁄
P⁄
1
i
i
1
n
i
˘
n
¡
˘
i
P
x
b y
b
P
b
P
n
b
wˆ⁄b
˘
i ˘1 i
i
¡ i ˘1 i
i ˘1 i
1
n
n
(x⁄b )2
¡
(
n
x⁄b )2
Pi ˘1
i
Pi ˘1
i
2. The bagging estimate is defined by
1
B
fˆ⁄b (x).
fˆbag (x) ˘ B
X
b˘1
2
Show that as B
,
1
P
B
wˆ⁄b P E
wˆ⁄, where f ⁄(x)
wˆ⁄
wˆ⁄x is a univariate
linear model fitted
⁄
˘ {(x1⁄, y1⁄), (x2⁄, y2⁄), . . . , (xN⁄
, yN⁄ )} with each (xi⁄, yi⁄) »
! 1 B
b˘1
i
!
Pˆ
i
˘
0
¯
1
from Z
ˆ. (Hint: Use law of large number.)
P
(Remark: Therefore, fˆ (x) P E fˆ⁄(x) as B .)
bag !ˆ !1
P
3. The leave-one-out fitted model fˆ(¡i )(x) is defined as the fitted function from the data set Z\{(xi , yi )}. Repeat the definition in 1, what is the definition of the bagging estimate of fˆ(¡i )(x)? And what is its limit as B ! 1?
4. (Optional) Let L(y, f ) be the loss function for the model f and the target y. The out-of-bag estimate on the sample xi is defined as
1
fˆoob (xi ) ˘
fˆ⁄b (xi )
C (¡i )
j
(
i )
j
b2X¡
C
The out-of-bag error is defined by using the definition of out-of-bag samples:
1
N
Er r oob ˘ N
L(yi , fˆoob (xi )),
d
X
i ˘1
where C (¡i ) is the set of indices of the bootstrap samples b that do not contain observation i , and jC (¡i )j is the number of such samples.
The leave-one-out cross-validation error for the model fˆ is defined as
1
N
Er r CV ˘ N
L(yi , fˆ(¡i )(xi ))
d
X
i ˘1
Please show that leave-one-out CV error of bagging estimate fˆbag approaches its OOB error Er r oob in probability as B ! 1, and they share the same limit
1
P
N
i ˘1
¡
i i
N
L(yi , EPˆ(d¡i)fˆ(¡i )⁄(xi )), where Pˆ(¡i ) is the empirical distribution putting equal
probability 1/(N 1) on each of the data points except for (x , y ).
(Remark: As a result, for bagging and random forest, we can use OOB error in-stead of CV error to validate the model and determine the tuning parameters.)
3