$19
1. Consider the email spam classification problem of Murphy Problem 8.1. Suppose you intend to use a linear perceptron classifier on that data (not logistic regression as directed in Problem 8.1). In the parts below, unless stated otherwise, assume the
dataset of
N = 4601 samples is split into
NTr = 3000 for training and
NTest = 1601
for testing.
Also, for the tolerance δ in the VC generalization bound, use 0.1 (for a
certainty of 0.9). The parts below have short answers.
Hint: You may use the relation that if
H is a linear perceptron classifier in D
dimensions (D features),
dVC (H ) = D + 1 . ( This will be proved in Problem 2.)
a)
What is the VC dimension of the hypothesis set?
b)
Expressing
the
upper
bound
on
the
out-of-sample
error
as
Eout (hg ) ≤ Ein (hg )+ εvc
For Ein (hg )
measured on the training data, use dvc
from part (a) to get a value
for εvc .
c) To get a lower εvc , suppose you reduce the number of features to D = 10 , and also increase the training set size to 10,000. Now what is εvc ?
d) Suppose that you had control over the number of training samples NTr (by
collecting more email data). How many training samples would ensure a generalization error of εvc = 0.1 again with probability 0.9 (the same tolerance
◦ = 0.1), and using the reduced feature set (10 features)?
e) Instead suppose you use the test set to measure Ein (hg ) , so let’s call it Etest (hg ) . What is the hypothesis set now? What is its cardinality?
f) Continuing from part (e), use the bound:
Eout (hg ) ≤ Etest (hg )+ ε
Use the original feature set and the original test set, so that NTest = 1601 . Give an appropriate expression for ε and calculate it numerically.
2. AML Exercise 2.4 (page 52). In addition to the hints given in the book, you can solve the problem by following the steps outlined below.
p. 1 of 3
For part (a):
i.
Write a point x
as a
d +1 dimensional vector;
i
ii. Construct the (d + 1) × (d + 1) matrix suggested by the book;
iii. Write h( X ) , the output of the perceptron, as function of X and the weights w (note that h( X ) is a d +1 dimensional vector with elements +1 and -1);
iv. Using the nonsingularity of X , justify how any h( X ) can be obtained.
For part (b):
i. Write a point xk as a linear combination of the other d +1 points;
ii. Write h( xk ) (output for the chosen point) and substitute the value of xk by the expression just found on the previous item (Hint: use the sgn{i} function);
iii. What part of your expression in (ii) determines the class assignment of each point
x i , for i ≠ k ?
iv. You have just proven (part (a)) that h( X ) with X (d +1)×(d +1) can be shattered. When we add a (d + 2)th line to X can it still be shattered? In other words, can you choose the value of h( xk ) ? Justify your answer. Hint: you can choose the class label of the other (d + 1) points.
3. AML Problem 2.24 (page 75), except
▪ Replace part (a) with:
(a.1) For a single given dataset, give an expression for g(D) (x) . (AML notation)
(a.2) Find g (x) analytically; express your answer in simplest form.
▪ For parts (b) and (c), obtain ΕD {Eout } by direct numerical computation, not by adding bias and var.
▪ For part (d), obtain bias(x), var(x), bias, var, and ΕD {Eout } , all by analytical (pencil and paper) techniques.
4. AML Problem 2.13 (a), (b).
p. 2 of 3
5. AML Problem 4.4 (a)-(c), plus additional parts (i)-(iii) below.
>> For part (c), assume both g10 (x) and f (x ) are given as functions of x, and you
can
express
your
answer
}
in
terms
of
them;
and
define
2
.
Eout (g10 ) = Εx, y { g10
(x) − y(x)
(i) In Fig. 4.3(a), set σ! = 0.5, and traverse the horizontal line from N ≈ 60 to N ≈ 130. Explain why ℋ"# transitions from overfit to good fit (relative to ℋ!).
(ii) Also in Fig. 4.3(a), set N = 100, and traverse the vertical line from σ! = 0 to σ! = 2. Explain why ℋ"# transitions from good fit to overfit (relative to ℋ!).
(iii) In Fig. 4.3(b), set N ≈ 75, and traverse the vertical line from Q$ = 0 to Q$ = 100. Explain the behavior.
p. 3 of 3