$24
Goals. The goal of this exercise is to
Implement and debug Support Vector Machine (SVM) using SGD and coordinate descent.
Derive updates for the coordinate descent algorithm for the dual optimization problem for SVM. Implement and debug the coordinate descent algorithm.
Compare it to the primal solution.
Setup, data and sample code. Obtain the folder labs/ex07 of the course github repository
github.com/epfml/ML course
We will nally depart from using the height-weight dataset and instead use the larger CERN dataset from Project 1 in this exercise. We have provided sample code templates that already contain useful snippets of code required for this exercise.
Support Vector Machines using SGD
Until now we have implemented linear and logistic regression to do classi cation. In this exercise we will use the Support Vector Machine (SVM) for classi cation. As we have seen in the lecture notes, the original optimization problem for the Support Vector Machine (SVM) is given by
2
N
nX
xw) +
2
(1)
min
`(y
w
2 k
w RD
n
n
k
=1
where ` : R ! R, `(z) := maxf0; 1 zg is the hinge loss function. Here for any n, 1 n N, the vector xn 2 RD is the nth data example, and yn 2 f 1g is the corresponding label.
Problem 1 (SGD for SVM):
Implement stochastic gradient descent (SGD) for the original SVM formulation (1). That is in every iteration, pick one data example n 2 [N] uniformly at random, and perform an update on w based on the (sub-)gradient of the nth summand of the objective (1). Then iterate by picking the next n.
Fill in the notebook functions calculate accuracy(y, X, w) which computes the accuracy on the train-ing/test dataset for any w and calculate primal objective(y, X, w, lambda ) which computes the total primal objective (1).
Derive the SGD updates for the original SVM formulation and ll in the notebook function
calculate stochastic gradient() which should return the stochastic gradient of the total cost function (loss plus regularizer) with respect to w. Finally, use sgd for svm demo() provided in the template for training.
Support Vector Machines using Coordinate Descent
As seen in class, another approach to train SVMs is by considering the dual optimization problem given by
max 1
1
Y XXY
such that 0
n
1
8
n
(2)
2
2RN
where Y := diag(y), and X 2 RN D again collects all N data examples as its rows, as usual. In this approach we optimize over the dual variables and map the solutions back to the primal vector w.
Problem 2 (Coordinate Descent for SVM):
Derive the coordinate descent algorithm updates for the dual (2) of the SVM formulation. That is, in every iteration, pick a coordinate n 2 [N] uniformly at random, and fully optimize the objective (2) with respect to that coordinate alone.
After updating that coordinate n, update the corresponding primal vector w such that the rst-order correspon-dence is maintained, that is that always w = w( ) := 1 XY . Then iterate by picking the next coordinate n.
Mathematically derive the coordinate update for one coordinate n ( nding the closed-form solution to maximization over just that coordinate), when given and corresponding w.
Fill in the notebook functions calculate coordinate update() which should compute the coordinate update for a single desired coordinate and calculate dual objective() which should return the objective (loss) for the dual problem (2) .
Finally train your model using coordinate descent (here ascent) using the given function sgd for svm demo() in the template. Compare to your SGD implementation. Which one is faster? (Compare the training ob-jective values (1) for the w iterates you obtain from each method).
2Goals. The goal of this exercise is to
Implement and debug Support Vector Machine (SVM) using SGD and coordinate descent.
Derive updates for the coordinate descent algorithm for the dual optimization problem for SVM. Implement and debug the coordinate descent algorithm.
Compare it to the primal solution.
Setup, data and sample code. Obtain the folder labs/ex07 of the course github repository
github.com/epfml/ML course
We will nally depart from using the height-weight dataset and instead use the larger CERN dataset from Project 1 in this exercise. We have provided sample code templates that already contain useful snippets of code required for this exercise.
Support Vector Machines using SGD
Until now we have implemented linear and logistic regression to do classi cation. In this exercise we will use the Support Vector Machine (SVM) for classi cation. As we have seen in the lecture notes, the original optimization problem for the Support Vector Machine (SVM) is given by
2
N
nX
xw) +
2
(1)
min
`(y
w
2 k
w RD
n
n
k
=1
where ` : R ! R, `(z) := maxf0; 1 zg is the hinge loss function. Here for any n, 1 n N, the vector xn 2 RD is the nth data example, and yn 2 f 1g is the corresponding label.
Problem 1 (SGD for SVM):
Implement stochastic gradient descent (SGD) for the original SVM formulation (1). That is in every iteration, pick one data example n 2 [N] uniformly at random, and perform an update on w based on the (sub-)gradient of the nth summand of the objective (1). Then iterate by picking the next n.
Fill in the notebook functions calculate accuracy(y, X, w) which computes the accuracy on the train-ing/test dataset for any w and calculate primal objective(y, X, w, lambda ) which computes the total primal objective (1).
Derive the SGD updates for the original SVM formulation and ll in the notebook function
calculate stochastic gradient() which should return the stochastic gradient of the total cost function (loss plus regularizer) with respect to w. Finally, use sgd for svm demo() provided in the template for training.
Support Vector Machines using Coordinate Descent
As seen in class, another approach to train SVMs is by considering the dual optimization problem given by
max 1
1
Y XXY
such that 0
n
1
8
n
(2)
2
2RN
where Y := diag(y), and X 2 RN D again collects all N data examples as its rows, as usual. In this approach we optimize over the dual variables and map the solutions back to the primal vector w.
Problem 2 (Coordinate Descent for SVM):
Derive the coordinate descent algorithm updates for the dual (2) of the SVM formulation. That is, in every iteration, pick a coordinate n 2 [N] uniformly at random, and fully optimize the objective (2) with respect to that coordinate alone.
After updating that coordinate n, update the corresponding primal vector w such that the rst-order correspon-dence is maintained, that is that always w = w( ) := 1 XY . Then iterate by picking the next coordinate n.
Mathematically derive the coordinate update for one coordinate n ( nding the closed-form solution to maximization over just that coordinate), when given and corresponding w.
Fill in the notebook functions calculate coordinate update() which should compute the coordinate update for a single desired coordinate and calculate dual objective() which should return the objective (loss) for the dual problem (2) .
Finally train your model using coordinate descent (here ascent) using the given function sgd for svm demo() in the template. Compare to your SGD implementation. Which one is faster? (Compare the training ob-jective values (1) for the w iterates you obtain from each method).
2