Starting from:
$29.99

$23.99

Machine Learning Homework #2 Solution

1.  (20   points) Recall  that a  function  K  :  X  × X   7→   R  is a  valid  kernel  function  if it  is symmetric  and positive semi-definite  function.  For the current problem,  we assume that the domain  X  = R.

 

(a)  (10 points)  Let K1 , . . . , Km be a set of valid kernel functions.   Show that for any wj  ≥



j=1
 
0, j = 1, . . . , m, the function  K = Pm


wj Kj is a valid kernel function.

(b)  (10 points)  Consider  the  function  K (x, x0 )  = exp(−(x − x0 )2/2016)  where  x, x0   ∈  R.

Show that K  is a valid kernel function.

 

2.  (30  points) The goal is to evaluate  the performance  of an optimization algorithm  for SVMs which works on the dual.  The problem is based on the following paper:  “LIBSVM: A Library for Support  Vector Machines,” by Chih-Chung Chang and Chih-Jen  Lin.  Please read Sections

2.1, 4.1, and 6 of the paper.  You can also refer to the discussion on SVM and Lagrange duality in class as needed.

Recall  that the  dual  problem  for a  non-separable linear  SVM is a  quadratic program  on α ∈ Rn , where n is the number  of data  points and αi is the Lagrange multiplier  corresponding to  point  i.   The  quadratic program  has  a box-constraint  on each αi , i.e., 0 ≤ αi ≤ C  and



i=1
 
a linear  constraint  Pn


yi αi = 0 where yi  ∈  {−1, +1}.  The  dual  QP  (as  a minimization

problem)  can be written  as:

 

 

min

α∈Rn


 

 

1  n    n                                   n



i
 
X X yi yj xT xj αi αj − X αi



2
 
i=1 j=1


i=1

0 ≤ αi ≤ C ,    ∀i

n

X yi αi = 0 .

i=1


 

(1)

 

Note that the above is the same as (2) in the above paper.

 

Implement Algorithm  1 in Section 4.1.1 of the  paper  using ideas in Sections  4.1 and  6, and evaluate  its performance  on the MNIST-13 dataset. The dataset contains  two classes labeled as 1 and 3 in the first column of the csv file we provided.  All other  columns are data  values. The evaluation will be based on the optimization performance,  and not classification accuracy. So, we will use the entire dataset for training  (optimization). For the runtime  results,  run the code 5 times on the entire dataset and report  the average runtime  and standard deviation,  to account for any system  level fluctuations (file i/o,  other  processes, etc.).

You will have to submit  (i) summary of  methods and results report,  and  (ii) code for each algorithm:

(i)  Summary of methods and results: Algorithm  1 is a ‘decomposition  method’  which iteratively updates α by updating two components  (αi , αj ) (called working set) in each iteration while keeping the  remaining  (n − 2) components  fixed.  Clearly  describe  the sub-problem  on (αi , αj ),  how the  sub-problem  can  be solved efficiently,  and  how the working  set  is selected  in each  iteration.  Report  the  average  runtime  over 5 runs  on the  entire  dataset, along  with  the  standard deviation.    In  a figure,  plot  the  value  of dual objective function with increasing number  of iterations for each run, i.e., five plots overlayed on the same figure. The dual objective (in minimization form) should decrease over iterations to convergence.

Termination condition: Please  use a combination of the  following two termination conditions:   (1) after  the  current iteration, the  improvement  on the  objective  function is smaller  than 10−4 ; or (2) the  total  number  of iterations is greater  than  1000.  The algorithm  terminates if either  condition  (1) or  (2) is satisfied.

(ii)  Code: You will have  to submit  code for mySmoSVM(filename, numruns) (main  file).

This  main  file has input:  (1) a filename (including  extension  and  absolute  path) con- taining  the  dataset and  (2) the  number  of runs,  and  output:  (1) the  average  runtime and standard deviations  printed  to the terminal  (stdout). The function  take the inputs in this order and display the output via the terminal. The filename will correspond  to a plain text file for a dataset, with each line corresponding  to a data  point:  the first entry will be the label and the rest of the entries  will be feature  values of the data  point.  The data  for the plots can be written  in a file tmp in the same directory,  but  (i) we will not be using this data  for doing plots,  and (ii) you do not have to explain how you did the plots from this data.

You can submit  additional files/functions (as needed)  which will be used by the  main file. Put  comments in your code so that one can follow the key parts  and steps in your code.

 

3.  (50  points) The  goal is to evaluate  the  performance  of optimization algorithms  for SVMs which work on the primal.  The problem is based on the following papers:

 

(1)  “Pegasos:  Primal  Estimated sub-GrAdient  SOlver for SVM,” by S. Shalev-Shawtrz, Y.

Singer, and N. Srebro.  Please read Section 2 of the paper.  The algorithm  discussed in the paper  is a (stochastic) mini-batch (projected) sub-gradient descent method  for the SVM non-smooth  convex optimization problem.  For the experiments, we will assume b = 0 for the affine form f (x) = wT x + b (you can see related  discussions in Sections 4 and 5).

(2)  “Accelerating  Stochastic  Gradient Descent using Predictive Variance  Reduction,” by R.

Johnson  and T. Zhang.  Please read Section 2 and Figure  1 of the paper.  The algorithm discussed in the paper,  called SVRG (stochastic variance  reduced  gradient), is a variant of stochastic  gradient descent (SGD)  with  variance  reduction.  We  will use SVRG  for learning linear SVMs using sub-gradients as in Pegasos and assuming b = 0 in the function f (x) = wT x + b.

 

For the homework, you have to do the following:

 

(1)  (25 points) Implement the Pegasos algorithm  and evaluate its performance on the MNIST-13 dataset. The  evaluation will be based  on the  optimization performance,  and  not  classi- fication  accuracy.    So,  we will use the  entire  dataset for training  (optimization).   For

the  runtime  results,  run  the  code 5 times  and  report the  average  runtime  and  standard deviation.  The runs will be repeated  with different choices of mini-batch size (see below).

(2)  (25 points) Implement the SVRG algorithm  and evaluate its performance on the MNIST-13 dataset. The  evaluation will be based  on the  optimization performance,  and  not  classi- fication  accuracy.    So,  we will use the  entire  dataset for training  (optimization).   For the  runtime  results,  run  the  code 5 times  and  report the  average  runtime  and  standard deviation.  The runs will be repeated  with different choices of epoch size (see below).

 

You will have to submit  (i) summary of  methods and results report,  and  (ii) code for each algorithm:

 

(i)  Summary of  methods and results: Pegasos  works with  a mini-batch At  of size k in iteration t.  When  k = 1, we get (projected) stochastic  gradient  descent,  and  when k = n, we get (projected) gradient descent,  since the entire dataset is considered in each iteration. For the runs,  we will consider the following values of k: (a) k = 1, (b) k = 20 (1 % of data), (c) k = 200 (10 % of data), (d) k = 1000 (50 % of data), and (e) k = 2000 (100 % of data).  For  fixed percent  mini-batches, pick the  corresponding  percentages from each class, e.g., for 10%, pick 10% of samples from each of the two classes.

For each choice of k as above, report  the average runtime  of Pegasos over 5 runs on the entire  dataset, along with the standard deviation.  For each choice of k, plot the primal objective  function  value for each run with increasing  number  of iterations. Thus,  there will be a figure with 5 plots (different runs) overlayed for each choice of k, and a separate figure for each k.

SVRG works with a SGD epoch size m for each outer  iteration s.  When m = 1, we get simply  do one SGD update  in every outer  iteration, and  when m = n,  we do n SGD updates in each outer  iteration.  For  the  runs,  we will consider  the  following values of m:  (a) m = 1, (b) m = 20 (1 % of data), (c) m = 200 (10 % of data), (d) m = 1000 (50

% of data), and (e) m = 2000 (100 % of data).

For each choice of m as above, report  the average runtime  of SVRG over 5 runs on the entire dataset, along with the standard deviation.  For each choice of m, plot the primal objective function  value for each run with increasing number  of gradient computations, as used in the paper.  Thus,  there  will be a figure with 5 plots (different runs)  overlayed for each choice of m, and a separate  figure for each m.

Termination condition: Please use ktot   - the total  number  of gradient computations

- as the termination condition.  For this question,  set ktot  = 100n, where n is your data size (number  of data  points).  The algorithm  terminates if ktot   is reached.

(ii)  Code: You will have to submit code for (1) myPegasos(filename, k, numruns) (main file), and (2) mySVRG(filename, m, numruns) (main  file). The main files have input: (1) a filename (including  extension and absolute  path) containing  the dataset, (2) mini- batch  size k for Pegasos and SGD epoch size m for SVRG, and (3) the number  of runs, and  output: (1) the  average  runtime  and  standard deviations  printed  to the  terminal (stdout). The  functions  must  take  the  inputs  in this  order  and  display  the  output via the  terminal. The filename will correspond  to a plain text  file for a dataset, with each line corresponding  to a data  point:  the  first entry  will be the  label and the  rest of the entries  will be feature  values of the data  point.  The data  for the plots can be written  in

a file tmp in the  same directory,  but  (i) we will not  be using this  data  for doing plots, and (ii) you do not have to explain how you did the plots from this data.

You can submit  additional files/functions (as needed)  which will be used by the  main file. Put  comments  in your code so that one can follow the key parts  and steps in your code.

 

 

Instructions

 

Follow the rules strictly. If we  cannot run your functions, you  get 0 points.

 

•  Things to submit

 

1.  hw2.pdf:  The  report  that contains  the  solutions  to Problems  1,2, and  3, including  the summary  of methods  and results.

2.  mySmoSVM: Code for Question  2.

3.  myPegasos and mySVRG: Code for Question  3.

4.  README.txt: README  file that contains  your name,  student ID, email, instructions on how to your run program,  any assumptions you’re making,  and any other  necessary details.

5.  Any other  files, except the data,  which are necessary for your program.

More products