$35
This exercise is an individual exercise, that means that you have to give your own answer, and you should not copy it from somebody else. You may discuss the problem with others, but you have to formulate your own anser.
You can use any programming languange that you prefer, but Matlab or Python are recommended. The responsible teacher can only help with Matlab.
Computational Learning Theory: boosting
Before you answer the following questions, you need to read the pa-per ’A decision-theoretic generalization of on-line learning and an appli-cation to boosting’ by Y. Freund and R.E. Schapire, 1995 (also available from Brightspace, under Course Documents, Reading Material, Computa-tion Learning Theory).
Show the equivalence between the formulation of Adaboost as it is given in the slides of the course, and the formulation as it is given in the paper.
Implement a weak learner WeakLearn: the decision stump, that is min-imising the apparent error.
To nd the optimal feature f and threshold , you have to do an ex-haustive search for a training set. Convince yourself that it indeed optimizes what it should optimize.
Extend the implementation of the weak learner such that it accepts a weight per object. Convince yourself that it indeed optimizes what it should optimize.
1
Show your code.
Test the implementation of the decision stump on a simple dataset: gen-erate two classes from two Gaussian distributions, where the means are 1 = [0; 0]T and 2 = [2; 0]T , and the covariance matrices are identity matrices.) Make a scatterplot of the data, and give the optimal param-eters obtained by your decision stump. Do the parameter values make sense?
Does the decision stump change if you rescale one of the features (for instance, when you multiply feature 2 by a factor of 10)?
d. Test the implementation on (an adapted version of) the Fashion NIST dataset. On Brightspace you will nd a training and a test set, called fashion57 train.txt and fashion57 test.txt. They consist of the pixel values of small 28 28 images. In the training set, the rst 32 rows are class 1, and the next 28 are class 2. The test set is larger: the rst 195 rows contain samples from class 1, and the next 205 class 2. In Matlab you can read in the data using:
a = dlmread(’fashion57_train.txt’);
Train the decision stump on the training set and evaluate on the test set. What are the apparent error and the test error?
Implement the AdaBoost algorithm that is described in the paper of Fre-und and Shapire (use the notation and conventions from the paper, not the slides!), and implement the code to classify new and unseen data.
Show the code.
Convince yourself that it indeed optimizes what it should optimize.
Test your implementation on the simple dataset that you created in (c.). Find out which objects obtain a large weight wit.1 Keep the number of iterations T xed to, say, T = 100.
Test the implementation on the Fashion dataset. What is the classi cation error on the test objects? How does this error depend on the number
1And no, it is not su cient to just say ’object 13 and 7 have high weight’. Show me; do the more simple or more di cult objects get high weight?
2
of iterations T ? If you take the classi er with the optimal T , which training objects get a high weight? Show them!
h. Show the learning curve for n=[2,4,6,10,15,20] training objects per class.
3