$29
Exercise 1
Log into “cookdata.cn”, and enroll the course “êâ‰Æ Ú”. Finish the online exer-cise there.
Exercise 2 (Decision Tree)
You are trying to determine whether a boy finds a particular type of food appealing based on the food’s temperature, taste, and size.
Food Sample Id
Appealing
Temperature
Taste
Size
1
No
Hot
Salty
Small
2
No
Cold
Sweet
Large
3
No
Cold
Sweet
Large
4
Yes
Cold
Sour
Small
5
Yes
Hot
Sour
Small
6
No
Hot
Salty
Large
7
Yes
Hot
Sour
Large
8
Yes
Cold
Sweet
Small
9
Yes
Cold
Sweet
Small
10
No
Hot
Salty
Large
1. What is the initial entropy of “Appealing”?
2. Assume that “Taste” is chosen as the root of the decision tree. What is the infor-mation gain associated with this attribute.
3. Draw the full decision tree learned from this data (without any pruning).
1
Exercise 3: (Maximum Likelihood Estimate (MLE, 4Œq, O))
Suppose that the samples {xi }n
are drawn from Normal distribution N („, ¾2) with
1
1
i ˘1
2
2
p.d.f. fµ(x) ˘
p
exp(¡
2¾2
(x ¡„)
), where µ ˘ („, ¾ ). The Maximum likelihood esti-
2…¾2
mator (MLE) of µ is the one that maximize the likelihood function
n
Y
L(µ) ˘ fµ(xi )
i ˘1
1. Show that the MLE estimator of the parameters („, ¾2) is
1 Xn
„ˆ ˘ x¯ ˘ n i ˘1 xi ,
2. Show that
E„ˆ ˘ „,
¾ˆ2˘ 1
n
E‡ n ¾ˆ2
n ¡1
n
X(xi ¡ x¯)2.
• ˘1
·
• ¾2,
where E is the expectation. This means that „ˆ is an unbiased estimator of „, but ¾ˆ2 is a biased estimator of ¾2.
Exercise 4 (MLE for Naive Bayes methods)
Suppose that X and Y are a pair of discrete random variables, i.e., X 2 {1, 2, . . . , t},
• 2 {1, 2, . . . , c}. Then the probability distribution of Y is solely dependent on the set
c
, where pk ˘ Pr(Y
˘ k) with
c
˘ 1. Similarly, the condi-
of parameters {pk }k˘1
k˘1 pk
tional probability distribution of X given Y is solelyPdependent on the set of parame-
s˘1,...,t
psk
˘ Pr(X ˘ sjY
˘ k) with
t
˘ 1. Now we have a set of
ters {psk }k˘1,...,c , wheren
s˘1 psk
samples {(xi , yi )}i ˘1 drawn independently from thePjoint distribution Pr(X , Y ). Prove
that the MLE of the parameter pk (prior probability) is
n
I(y
k)
pˆk ˘
Pi ˘1
n
i ˘
, k ˘ 1, . . . , c;
and the MLE of the parameter pks is
sk
˘ P
n
˘
˘
i
Pi ˘1 I(yi ˘ k)
1 I(xi ˘ s, yi ˘ k)
pˆ
˘ n
, s 1, . . . , t, k 1, . . . , c.
Exercise 5 (Error bound for 1-nearest-neighbor method) In class, we have estimated that the error for 1-nearest-neighbor rule is roughly twice the Bayes error. Now let us make it more rigorous.
Let us consider the two-class classification problem with X ˘ [0, 1]d and Y ˘ {0, 1}. The underlying joint probability distribution on X £ Y is P(X, Y ) from which we deduce that the marginal distribution of X is pX(x) and the conditional probability distribution is ·(x) ˘ P(Y ˘ 1jX ˘ x). Assume that ·(x) is c-Lipschitz continuous: j·(x)¡·(x0)j É ckx¡ x0k for any x, x0 2 X . Recall that the Bayes rule is f ⁄(x) ˘ 1{·(x)¨1/2}. Given a training set
2
• ˘ {(xi , yi )}ni˘1 with (xi , yi ) i .i».d. P (or equivalently S » Pn ), the 1-nearest-neighbor rule is f 1N N (x) ˘ y…S (x) where …S (x) ˘ arg mini kx ¡xi k.
Define the generalization error for rule f as E( f ) ˘ E(X,Y )»P 1Y 6˘f(X). Show that
ES»Pn E( f 1N N ) É 2E( f ⁄) ¯cES»Pn Ex»pX kx ¡x…S (x)k.
(This means that we can have a precise error estimate for 1-nearest-neighbor rule if we can bound ES»Pn Ex» pX kx ¡x…(x)k.)
3