$24
All questions have multiple-choice answers ([a], [b], [c], ...). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people’s solutions. Your solutions should be based on your own work. De nitions and notation follow the lectures.
Note about the homework
The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and ap-proach the problem from di erent angles.
The problems range from easy to di cult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer.
The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning bene t.
You are also encouraged to take part in the forum http://book.caltech.edu/bookforum
where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the \BEFORE posting answers" announcement at the top there).
c 2012-2015 Yaser Abu-Mostafa. All rights reserved. No redistribution in any format. No translation or derivative products without written permission.
1
Generalization Error
In Problems 1-3, we look at generalization bounds numerically. For N dvc, use the simple approximate bound Ndvc for the growth function mH(N).
For an H with dvc = 10, if you want 95% con dence that your generalization error is at most 0.05, what is the closest numerical approximation of the sample size that the VC generalization bound predicts?
400,000
420,000
440,000
460,000
480,000
There are a number of bounds on the generalization error , all holding with probability at least 1 . Fix dvc = 50 and = 0:05 and plot these bounds as a function of N. Which bound is the smallest for very large N, say N = 10; 000? Note that [c] and [d] are implicit bounds in .
8
4m
(2N)
[a]
Original VC bound:q
ln
H
N
[b]
Rademacher Penalty Bound:q
2 ln(2Nm
H
(N))
+ q
2
ln
1
+
1
N
N
N
q
N
2
[c]
Parrondo and Van den Broek:
1
(2 + ln
6mH(2N)
)
1
4m
(N )
[d]
Devroye:q
(4 (1 + ) + ln
H
)
2N
They are all equal.
For the same values of dvc and of Problem 2, but for small N, say N = 5, which bound is the smallest?
8
4m
(2N)
[a]
Original VC bound:q
ln
H
N
[b]
Rademacher Penalty Bound:q
2 ln(2Nm
H
(N))
+ q
2
ln
1
+
1
N
N
N
q
N
2
[c]
Parrondo and Van den Broek:
1
(2 + ln
6mH(2N)
)
1
4m
(N )
[d]
Devroye:q
(4 (1 + ) + ln
H
)
2N
They are all equal.
2
Bias and Variance
Consider the case where the target function f : [ 1; 1] ! R is given by f(x) = sin( x) and the input probability distribution is uniform on [ 1; 1]. Assume that the training set has only two examples (picked independently), and that the learning algorithm produces the hypothesis that minimizes the mean squared error on the examples.
Assume the learning model consists of all hypotheses of the form h(x) = ax. What is the expected value, g(x), of the hypothesis produced by the learning algorithm (expected value with respect to the data set)? Express your g(x) as ax^, and round a^ to two decimal digits only, then match exactly to one of the following answers.
g(x) = 0
g(x) = 0:79x
g(x) = 1:07x
g(x) = 1:58x
None of the above
What is the closest value to the bias in this case?
0.1
0.3
0.5
0.7
1.0
What is the closest value to the variance in this case?
0.2
0.4
0.6
0.8
1.0
Now, let’s change H. Which of the following learning models has the least expected value of out-of-sample error?
Hypotheses of the form h(x) = b
Hypotheses of the form h(x) = ax
3
Hypotheses of the form h(x) = ax + b
Hypotheses of the form h(x) = ax2
Hypotheses of the form h(x) = ax2 + b
VC Dimension
Assume q 1 is an integer and let mH(1) = 2. What is the VC dimension of a
hypothesis set whose growth function satis es: mH(N + 1) = 2mH(N) Nq ? Recall that Mm = 0 when m M.
[a] q 2
[b] q 1
[c] q
[d] q + 1
[e] None of the above
9. For hypothesis sets H1; H2; :::; HK with nite, positive VC dimensions dvc(Hk), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the intersection of the sets: dvc(TKk=1Hk)? (The VC dimension of an empty set or a singleton set is taken as zero)
[a] 0 dvc(TK Hk) PK dvc(Hk)
k=1 k=1
[b] 0 dvc(TKk=1Hk) minfdvc(Hk)gKk=1
[c] 0 dvc(TKk=1Hk) maxfdvc(Hk)gKk=1
[d] minfdvc(Hk)gKk=1 dvc(TKk=1Hk) maxfdvc(Hk)gKk=1
[e] minfdvc(Hk)gKk=1 dvc(TKk=1Hk) PKk=1 dvc(Hk)
10. For hypothesis sets H1; H2; :::; HK with nite, positive VC dimensions dvc(Hk), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the union of the sets: dvc(SKk=1Hk)?
[a] 0
dvc(
S
K
H
k)
P
K dvc(
k)
K
k=1
HK
k=1
[b] 0
dvc(Sk=1Hk)
K
1 + Pk=1 dvc(Hk)
4
[c] minfdvc(Hk)gkK=1 dvc(SkK=1Hk)
PkK=1 dvc(Hk)
f H
gK
S
K
H
K
HK
dvc(
K
Pk=1
dvc(
[d] max dvc( k)
kK=1
k=1
k)
k)
[e] maxfdvc(Hk)gk=1
dvc(
Sk=1Hk)
K 1 + Pk=1 dvc(Hk)
5