$24
a The assignment should be submitted in the PDF format through Collob. If you prefer hand-writing QA parts of answers, please convert them (e.g., by scanning or using PhoneApps like o ceLens) into PDF form.
For questions and clari cations, please post on piazza.
Policy on collaboration:
Homework should be done individually: each student must hand in their own answers. It is acceptable, however, for students to collaborate in guring out answers and helping each other solve the problems. We will be assuming that, with the honor code, you will be taking the responsibility to make sure you personally understand the solution to any work arising from such collaboration.
d Policy on late homework: Homework is worth full credit at the midnight on the due date. Each student has three extension days to be used at his or her own discretion throughout the entire course. Your grades would be discounted by 15% per day when you use these 3 late days. You could use the 3 days in whatever combination you like. For example, all 3 days on 1 assignment (for a maximum grade of 55%) or 1 each day over 3 assignments (for a maximum grade of 85% on each). After you’ve used all 3 days, you cannot get credit for anything turned in late.
1. Support Vector Machines with Scikit-Learn
(1) Install the latest stable version of scikit-learn following directions available at http://scikit-learn.org/stable/install.html Also make sure to download "salary.labeled.csv" from collab.
(2) For this assignment, you will create a program using scikit-learn’s C-Support Vector Classi er.1 Given a proper set of attributes, the program will be able to determine whether an individual makes
more than 50,000 USD/year. You may use code from HW2 to help you import the data. Bear in mind you will also need to do some preprocessing of the data before applying the SVM.
Two sample les are provided. The unlabeled sample set "salary.2Predict.csv" is a text le in the same format as the labeled dataset \salary.labeled.csv", except that its last column includes a fake eld for class labels.
2.1 You are required to provide the predicted labels for samples in "salary.2Predict.csv".
2.2 We will evaluate your output ‘predictions’ - an array of strings (\50K" or \<=50K") corresponding to the true labels of these test samples (ATT: you don’t have these labels !!! ). This simulates a Kaggle-competition in which test labels are always held out and only team-ranking be released after all teams have submitted their predictions. When grading this assignment, we will rank all students’ predictions. So please try to submit the best performing model that you can!
Documented here: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.svm
1
2.3 You need to report the classi cation accuracy results from 3-fold cross validation (CV) on the labeled set using at least three di erent SVM kernels you pick. Please provide details about the kernels you have tried and their performance (e.g. classi cation accuracy ) on train and test folds into the writing. For instance, you can summarize the results into a table with each row containing kernel choice, kernel parameter, CV train accuracy and CV test accuracy.
(Hint: you can choose SVM kernels like, basic linear kernel / polynomial kernel, varying its parameters / RBF kernel, varying its parameters).
Submission Instructions: You are required to submit the following :
(The starting code, ’income classi er.py’, has been provided in Collab.)
1. A python program that includes the statements:
c l f = S v m I n c o m e C l a s s i f i e r ( )
t r a i n e d m o d e l , c v s c o r e = c l f . t r a i n a n d s e l e c t m o d e l ( ‘ ‘ s a l a r y . l a b e l e d . c s v " )
It should be able to train and select a model using a set of hyperparameters on the training data, these hyperparameters can be hard coded or be input by the user.
Next, we should be able to use a trained model to classify an unlabeled test set using the following function:
p r e d i c t i o n s = c l f . p r e d i c t ( ‘ ‘ s a l a r y . 2 P r e d i c t . c s v " , t r a i n e d m o d e l )
2. A le \predictions.txt" generated by:
c l f . o u t p u t r e s u l t s ( p r e d i c t i o n s )
Please do not archive the le or change the le name for the automated grading.
A table in your PDF submission reporting classi cation accuracy (score) averaged over the test folds, along with details of the kernels, best performing hyperparameter C for each case etc.
Classes: 50K, <=50K.
Attributes:
age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th,
Preschool.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China,
Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan,
Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
Table 1: About the data in Q1.
Sample Exam Questions:
Each assignment covers a few sample exam questions to help you prepare for the midterm and the nal. (Please do not bother by the information of points in some the exam questions.)
Question 1. Support Vector Machine
Soft-margin linear SVM can be formulated as the following constrained quadratic optimization problem:
argmin w;b
1
wT w + C
m
i
Pi=1
such that
f T
g 2
yi(w xi + b) 1 i
i 0
8i
T
w) and the penalty of
where C is the regularization parameter, which balances the margin (smaller w
mis-classi cation (smaller P
m
i).
i=1
(True/False) Number of support vectors do not depend on the selected value of C. Please provide a one-sentence justi cation.
Answer: False. Changing C will change the max-margin boundary lines. In the gure below,
nC=0 is the number of support vectors for C getting close to 0 (limC!0) and nC=1 is the number of support vectors for C = 1.
(b) Select the correct option:( )
nC=0 nC=1
nC=0 < nC=1
nC=0 = nC=1
Not enough information provided
Answer: (1). less penalty indicates wider margin in the gure.
(c) Match the following kernels used for SVM classi cation with the following gures:
Linear Kernel : K(x; x0) = xT x0
Polynomial Kernel (order = 2) : K(x; x0) = (1 + xT x0)2
2 jj
jj
(3) Radial Basis Kernel : K(x; x0) = exp(
1
x
x0
2)
Answer: (2) ; (1) ; (3)
Rank the best to the worse performing kernels in (c) (through just visual-inspection)
Best:
Middle:
Worst:
Answer: (3) ; (2) ; (1)
What is the leave-one-out cross-validation error for maximum margin separation in the following gure ? (we are asking for a number) Please provide a (at least one-sentence) justi cation.
Answer: 0. Removing any single point will not change the max-margin boundary. All points are correctly classi ed. LOOCV error is zero.
4
Question 2. Another Support Vector Machine
Consider a supervised learning problem in which the training examples are points in 2-dimensional space.
The positive examples are (1; 1) and ( 1; 1). The negative examples are (1; 1) and ( 1; 1).
(1 pts) Are the positive examples linearly separable from the negative examples in the original space? Answer: SOLUTION: NO
(4 pts) Consider the feature transformation (x) = [1; x1; x2; x1x2], where x1 and x2 are, respectively, the rst and second coordinates of a generic example x. The prediction function is y(x) = wT (x) in this feature space. Give the coe cients , w, of a maximum-margin decision surface separating the positive examples from the negative examples. (You should be able to do this by inspection, without any signi cant computation.)
Answer: SOLUTION: w = (0; 0; 0; 1)T
(3 pts) Add one training example to the graph so the total ve examples can no longer be linearly separated in the feature space (x) de ned in problem 5.2
/3/3
(4 pts) What kernel K(x; x0) does this feature transformation correspond to? Answer: SOLUTION: 1 + X1X10 + X2X20 + X1X2X10X20
5