Starting from:
$35

$29

Assignment Five Solution

Provide credit to any sources other than the course sta that helped you solve the problems. This includes all students you talked to regarding the problems.

You can look up de nitions/basics online (e.g., wikipedia, stack-exchange,

Submission rules are the same as previous assignments.

Please write your net-id on top of every page. It helps with grading.

Problem 1. (15 points). SVM’s obtain non-linear decision boundaries by mapping the feature


!
2 R
d to a possibly high dimensional space via a function  :
R
d
! R
m, and then  nding
vectors X










a linear decision boundary in the new space.















(X ;X

) =







































We also saw that to implement SVM, it su  ces to know the kernel function K! !i
j


! i
)

!

j
), without even explicitly specifying the function  .














(X


(X







vectors, X ; : : : ;X


Recall Mercer’s theorem. K is a kernel function if and only if for any n







!
1
!


n 2
d


















n
n


j















R
, and any real numbers c1; : : : ; cn, Pi=1 P
j=1 cicjK! !i

)

0.

































(X ;X


















1. Prove the following half of Mercer’s theorem (which we showed in class). If K is a kernel then




n



n




j


























Pi=1 Pj=1 cicjK! !i

)

0.






























(X ;X


























2. Let d = 1, and x; y 2 R. Is the function K(x; y) = x + y a kernel?












3. Let d = 1, and x; y 2 R. Is K(x; y) = xy + 1 a kernel?


















4. Suppose d = 2, namely the original features are of the form!X i

=![X1!;X2].
Show that


K! !



! !






























(X;Y)=(1+X

Y )2 is a kernel function. This is called as quadratic kernel.
























R
2
! R
m




(X )

(Y)=(1+X

Y )2).





(Hint: Find a  :







(for some m) such that
!



!



! !






5. Consider the training examples h[0; 1]; 1i; h[1; 2]; 1i; h[
1; 2]; 1i; h[0; 11]; 1i; h[3; 4];
1i; h[  3; 4];  1i;


h[1
1];


1i; h[  1;

1];
1i. We have plotted the data points below.










Is the data linearly classi able in the original 2-d space? If yes, please come up with any linear decision boundary that separates the data. If no, please explain why.

Is the data linearly classi able in the feature space corresponding to the quadratic kernel. If yes, please come up with any linear decision boundary that separates the data. If no, please explain why.

1

15




10




5






8    6    4    2 





2    4    6    8


5

Problem 2. (10 points). The Gaussian kernel (also called Radial Basis Function kernel (RBF))
is:
(X ; Y ) = exp
!X !Y
22
;


k
2 2


K! !



k !
! !    2
where X ; Y are feature vectors in d dimensions. Suppose d = 1, and 2    = 1.

    1. Design a function  : R ! Rm that corresponds to Gaussian kernel for d = 1, and 2 2 = 1.

Hint: Use Taylor series expansion for the exponential function.

    2. What is the value of m you end up with?

Problem 3. (10 points). Let f; hi; 1    i    n be real-valued functions and let    2 Rn.  Let
n
P
L(z;    ) = f(z) +    ihi(z). In this problem, we will prove that the following two optimization
i=1
problems are equivalent.
min f(z)
z
(1)
min max L(z;  )
(2)
s.t. hi(z)   0; i = 1; : : : ; n:





z0

Let (z ;    ) be the solution of (2) and let zp be the solution of (1). Prove that:

L(z ;    ) = f(zp)

Hint: Use the fact that for any z, 0, L(z ; ) L(z ; ) and L(z ; ) L(z; z), where z = arg max 0 L(z; ).

You may follow the following steps but it is not required as long as your proof is correct. 1. Prove that L(z ; ) f(zp)

2
2. Prove that L(z ;    )    f(zp)

Problem 4 (25 points) SVM Classi cation. Please refer to the Jupyter Notebook in the as-signment, and complete the coding part in it! You can use sklearn SVM package: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

























































3

More products