Homework 4, Version 2 Solution

Starting from:

~~$35~~

$29

Home

This homework contains three questions. The last two questions require programming. This homework provides you an opportunity to explore two popular machine learning libraries. Although these are well-implemented libraries, you will soon see that you cannot treat them as black boxes. In order to obtain good results, you will need to know what the important hyper parameters are and how to tune them. The maximum score for this home work is 100 + 15 bonus points.

• Question 1 – Support Vector Machines (20 points)

1.1 Linear case (10 points)

Consider training a linear SVM on linearly separable dataset consisting of n points. Let m be the number of support vectors obtained by training on the entire set. Show that the LOOCV error is bounded above by mn .

Hint: Consider two cases: (1) removing a support vector data point and (2) removing a non-support vector data point.

1.2 General case (10 points)

Now consider the same problem as above. But instead of using a linear SVM, we will use a general kernel. Assuming that the data is linearly separable in the high dimensional feature space corresponding to the kernel, does the bound in previous section still hold? Explain why or why not.

• Question 2 – XGBoost (30 points)

In this question, you will use XGBoost to predict the income of a person. The data for this question is under HW4 q2.

You can use the implementation from https://github.com/dmlc/xgboost. To install it, use pip in-stall xgboost and then import it using from xgboost import XGBClassifier. Here is a tutorial on how to install and use it: https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/.

You will use the Census Income Dataset (https://archive.ics.uci.edu/ml/datasets/census+ income). This dataset includes attributes of 48842 people, such as their age and education. The task is to predict whether their income exceeds $50K per year. For each person, there are 14 attributes in total. Some attributes are categorical, like the education, and other attributes have continuous values, like the age. Review the attributes and their possible values at the link provided above.

The first step is to load your training and test data that are provided in the files adult:data and adult:test correspondingly. (Tip: To load the data, you can use numpy.genfromtxt). Inspect your data carefully: The first 14 columns of each file correspond to the features (attributes) of the persons. The last column corresponds to the labels (income 50K or > 50K). Since you have both categorical and continuous attributes, you will first need to load the data as strings. Use “dtype=np.object” to load them to numpy arrays.

Then, you need to convert them to float. Keep the continuous attributes as they are. For the categorical at-tributes, you will need to convert them to integers. To do that, you can use sklearn.preprocessing.OrdinalEncoder.

1
For each column that corresponds to a categorical attribute, you can construct a new encoder that converts its values to integers. For example, the education attribute can be encoded as “Bachelors” = 0, “Masters” = 1, “Doctorate” = 2, etc. Same for the labels column (e.g. “ 50K” = 0 and “> 50K” = 1). At the end, convert your whole feature matrix and labels to float.

Important: Be careful when you convert the categorical attributes and labels of your test data. They need to correspond to the same integer values as in your training data. For example, if “Bachelors” corresponds to 0 in your training data, then it should also correspond to 0 in your test data. To ensure that, for each column of your training data, you can use the fit transform method of sklearn.preprocessing.OrdinalEncoder and then use the transform method for the corresponding column of your test data.

2.1 Question 2.1 (10 points)

Using the whole training set, train an XGBoost classifier and and test it on the test set. Report the accuracy and the confusion matrix on both the train and test sets.

2.2 Question 2.2 (20 points)

Perform k-fold cross-validation to find the best set of hyper-parameters for XGBoost. To do that, you will need to split your training data (X) to train (Xtrain) and validation (Xval) sets. For each fold (split), you will train a model on the training data Xtrain (subset of the original training data X) and test it on the validation set Xval. Use k = 10 folds. You can use sklearn.model selection.KFold.

Report the best cross-validation accuracy. Use this set of hyper-parameters and train XGBoost on the entire training data, and report the accuracy and confusion matrix on the test data.

• Question 3 – SVM for object detection (50 points + 15 bonus points)

In this question, you will train an SVM and use it for detecting human hands in images. You can use the SVM implementation of sklearn. For this question, it is sufficient and also much faster to use linear SVM (sklearn.svm.LinearSVC), but you can also experiment with non-linear kernel using sklearn.svm.SVC (it might take long time though).

As features, you will use deep features extracted from detectron2 https://github.com/facebookresearch/detectron2. We provide the utility functions for feature extraction.

To detect human hands in images, we need a classifier that can distinguish between hand image patches from non-hand patches. To train such a classifier, we can use SVMs. The training data is typically a set of images with bounding boxes of the hands. Positive training examples are image patches extracted at the annotated locations. A negative training example can be any image patch that does not significantly overlap with the annotated hands. Thus there potentially many more negative training examples than positive training examples. Due to memory limitation, it will not be possible to use all negative training examples at the same time. In this question, you will implement hard-negative mining to find hardest negative examples and iteratively train an SVM.

3.1 Preparation

3.1.1 For Conda environment

# Assume you have download the homework file and store in director hw4.

cd hw4

# C r e a t e new e n v i r o n m e n t

c o n d a c r e a t e −n c s e 5 1 2 s p r i n g 2 1 h w 4 p y t h o n = 3 . 7

2

c o n d a a c t i v a t e c s e 5 1 2

s p r i n g 2 1

h w 4

# I n s t a l l p y t o r c h
c o n d a i n s t a l l p y t o r c h t o r c h v i s i o n c u d a t o o l k i t = 1 0 . 0 −c p y t o r c h
# I n s t a l l d e p e n d e n c y l i b r a r i e s
p y t h o n −m
p i p
i n s t a l l opencv − p y t h o n m e a n

a v e r a g e

p r e c i s i o n
p y t h o n −m p i p i n s t a l l p y c o c o t o o l s s c i k i t − l e a r n
# I n s t a l l d e t e c t r o n
g i t c l o n e

h t t p s : / / g i t h u b . com / f a c e b o o k r e s e a r c h / d e t e c t r o n 2 . g i t n
−− b r a n c h v 0 . 1 . 1 d e t e c t r o n 2

v 0 . 1 . 1

cd d e t e c t r o n 2

v 0 . 1 . 1
g i t c h e c k o u t d b 1 6 1 4 e
p y t h o n −m p i p i n s t a l l −e .
#
Move HW4

q3
i n t o d e t e c t r o n 2

v 0 . 1 . 1
mv
../HW4

q3 .

cd
HW4

q3

3.1.2 For Google Colab

For this question, you will need to use a GPU. You can use Google Colab. If so, remember to change the runtime type to GPU. Then, install the following prerequisites:

! p i p i n s t a l l m e a n a v e r a g e p r e c i s i o n

! g i t c l o n e h t t p s : / / g i t h u b . com / f a c e b o o k r e s e a r c h / d e t e c t r o n 2 . g i t −− b r a n c h

v0 . 1 . 1 d e t e c t r o n 2 v 0 . 1 . 1

%cd d e t e c t r o n 2 v 0 . 1 . 1

! g i t c h e c k o u t d b 1 6 1 4 e

! p i p i n s t a l l −e .

Inside the detectron2 v0.1.1 directory, unzip the given HW4 q3.zip:

• u n z i p HW4 q3 . z i p %cd HW4 q3

3.1.3 Data download

Download the ContactHands dataset and put it inside the HW4 q3/ directory from http://vision.cs.

stonybrook.edu/˜supreeth/ContactHands_data_website/ or by running:

! w g e t h t t p s : / / p u b l i c . v i n a i . i o / C o n t a c t H a n d s . z i p ! u n z i p C o n t a c t H a n d s

The file ContactHands/README.md provides useful information regarding the structure of this dataset. For more information about the dataset, see:

‘Detecting Hands and Recognizing Physical Contact in the Wild.’ S. Narasimhaswamy, T. Nguyen, M. Hoai.

Advances in Neural Information Processing Systems (NeurIPS), 2020.

3
3.1.4 Data split

Under HW4 q3/sets/ you can find the data split that you will use for this question: train.txt corresponds to the training set, validation.txt corresponds to the validation set, extra-train.txt corresponds to more data for training (optional), and test.txt corresponds to the test set. Copy those files under ContactHands/ImageSets/Main/.

3.1.5 Annotations

Under HW4 q3/Annotations/ you can find the annotations that you will use for this question. The folder contains annotations for the training, validation and extra-train data. The annotations for the test data will not be released, but they will be used for testing your final submission result. Copy the Annotations/ folder and replace the ContactHands/Annotations/.

3.2 Helper functions

To help you, a number of utility functions and classes are provided in HW4 q3/. The most important func-tions are in hw4 utils.py:

1. Run python hw4 utils.py -va to visualize some annotated samples.

2. Use get pos and random neg() to get initial training/validation data (dataset = ‘train’ or dataset

◦ ‘validation’ correspondingly). This function returns the training/validation feature matrix D, the corresponding training/validation labels lb. There are 2 classes: positive (1) and negative (-1) class. Positive instances are deep features extracted at the locations of hands. Negative instances are deep features at random locations of the images. Important: You first need to initialize feat extractor

◦ prepare second stream() before calling this function.

3. Use detect() to run the sliding window detector. This returns a numpy array of bounding box locations and corresponding SVM scores. This function can be used for detecting hands in an image. It can also be used to find hardest negative examples in an image.

4. Use generate result file() to generate a result file (dataset = ‘validation’ or dataset = ‘test’ for the validation or test set correspondingly). Set the argument num img to run the detection for a subset of test images (e.g. num img=100).

5. Use compute mAP() to compute the Average Precision for the result file.

6. Use get iou() to compute the overlap between two rectangular regions. The overlap is defined as the area of the intersection over the area of the union. A returned detection region is considered correct (true positive) if there is an annotated hand such that the overlap between the two boxes is more than 0.5.

7. Some useful OpenCV functions to work with images are: imread, imshow, imresize.

In addition, detect.py includes the feature extraction using the detectron2.

3.3 What to implement

1. (15 points) Use the get pos and random neg() function to get the training data and train an SVM classifier clf. You can use the sklearn.svm.LinearSVC. Since you have a large number of data, you can limit the maximum number of iterations (e.g. max iter=1000).

Use the trained classifier to generate a result file (use generate result file()) for the valida-tion data. Then, run the compute mAP() to compute the AP and plot the precision recall curve. Submit your AP and precision recall curve on the validation data.

4
Algorithm 1 Hard negative mining algorithm

P osD all annotated hands

N egD random image patches

(w; b) trainSVM(P osD; N egD)

for iter = 1; 2; do

• All non support vectors in N egD.

B Hardest negative examples
. Run UB detection and find negative patches that
N egD (N egD n A) [ B.
. violate the SVM margin constraint the most

(w; b) trainSVM(P osD; N egD)

end for

2. Implement hard negative mining algorithm given in Algorithm 1. Positive training data and random negative training data can be generated using the get pos and random neg() function. At each iteration, you should remove negative examples that do not correspond to support vectors from the negative set. Use the function detect() on train images to identify hardest negative examples and include them in the negative training set.

Hints: (1) a negative example should not have significant overlap with any annotated hand. You can experiment with different threshold but 0.3 is a good starting point. (2) you should compute the objective value at each iteration; the objective values should not decrease. (3) to speed up you can write a modified version of detect() that uses a different set of bounding box proposals for training.

3. (20 points) Run the negative mining for 10 iterations. Assume your computer is not so powerful and so you cannot add more than 10000 new negative training examples at each iteration. Record the objective values (on train data) and the APs (on validation data) through the iterations. Plot the objective values. Plot the APs. On the validation data, you can also use get pos and random neg to sample 10 negative patches per validation image. To calculate AP, use sklearn.metrics.average precision score.

4. (15 points) For this question, you will need to generate a .npy result file for the test data using the function generate result file(). You will need to submit this file by uploading to https: //forms.gle/Y5qzA6Mi5Sz5SB2u9 to receive the AP on test data. Report the AP in your answer file. Important Note: You MUST use your Stony Brook ID as the name of your submission file, i.e., your SBU ID.npy (e.g., 012345679.npy). Your submission will not be evaluated if you don’t use your SBU ID. For this question, you don’t need to have the highest AP to earn full marks.

5. (15 bonus points) Your submitted result file for test data will be automatically entered in a competition for fame (https://bit.ly/31L9Cov). We will maintain a leader board and the top three entries at the end of the competition (due date) will receive 15, 10, and 5 bonus points. The ranking is based on AP.

You can submit the result as frequent as you want. However, the evaluation server will only evaluate all submissions two times a day, at 09:00am and 09:00pm. The system only keeps the recent submission file, and your new submission will override the previous ones. Therefore, you have two chances a day to evaluate your method.

You are allowed to use any feature types and classifiers for this part of the homework. In addition, you are allowed to fine-tune any part of the given code. For example, you can try tuning the sliding window detection, e.g., try different image scales, window sizes and strides. You can use more training data.

5
You can run hard negative mining algorithm for as many iterations as you want, and the number of negative examples added at each iteration is not limited by 10000. You can train with all available data, including “train”, “validation”, “extra-train”. You can also use data from other datasets. For example, see https://www3.cs.stonybrook.edu/˜cvl/projects/hand_det_attention/.

Check the following papers for the state-of-the-art performance on hand detection. If your method significantly outperforms these papers, we invite you to write a paper with us! Please email us directly if you think you have an awesome technique that obtains good results.

‘Detecting Hands and Recognizing Physical Contact in the Wild.’ S. Narasimhaswamy, T. Nguyen, M. Hoai. Advances in Neural Information Processing Systems (NeurIPS), 2020.

‘Contextual Attention for Hand Detection in the Wild.’ S. Narasimhaswamyy, Z. Wei, Y. Wang, J.

Zhang, M. Hoai. Proceedings of International Conference on Computer Vision (ICCV), 2019.

• What to submit

You will need to submit both your code and your answers to questions on Blackboard. Put the answer file and your python code in a folder named: SUBID FirstName LastName (e.g., 10947XXXX Barack Obama). Zip this folder and submit the zip file on Blackboard. Your submission must be a zip file, i.e, SUBID FirstName LastName.zip.

The answer file should be named: hw4-answers.pdf. You can use Latex if you wish, but it is not compulsory. The first page of the hw4-answers.pdf should be the filled cover page at the end of this homework. The remaining of the answer file should contain answers to Questions 1, 2, 3.

Your Python code for Questions 2 and 3 can be in separate notebooks or python files. Make sure that the name of each file is self-explanatory.

Make sure you follow the instructions carefully.

• Cheating warnings

Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You cannot ask and discuss with students from previous years. You cannot look up the solution online.

6
Cover page for answers.pdf

CSE512 Spring 2021 - Machine Learning - Homework 4

Your Name:

Solar ID:

NetID email address:

Names of people whom you discussed the homework with: