$29
This Assignment carries a total of 4 marks for CS337 Theory and 13 marks for CS335 Lab
• Feature Design for Perceptron
This task requires using the One-vs-rest classi er implemented in Assignment 2 for dataset D2. In the previous Assignment, we used the simplest possible features: value of each pixel. In this
task, you have to design at most 5 features using the pixel values and achieve good accuracy. The features have to be designed keeping in mind the dataset it is being applied to (see the images in data/D2/images.zip.
As a concrete illustration, consider the data you have used. In each training point (a binary matrix corresponding to an image), consider the number of separate, connected regions of white pixels. This quantity, although it does not vary across shapes, is an example of a feature that is not directly available to the classi er from the per-pixel information. Further you can add features that cumulatively describe all the per pixel values. You can also try analysing edges and corners. Note that you are not allowed to import any libraries for this task, strictly adhere to the TODOs in the le.
1
1.1
CS335: Lab
(a)
In perceptron.py, ll up the function get
features(x) which given a attened list of pixel
values for an image, to return the set of features. For other functions you can use the code
from Assignment 2. Full marks will be given if test accuracy is at least 90%.
(2 marks)
(b)
Explain the features used and the motivation in the report
(1 mark)
2 Logistic Regression
2.1 CS 337: Theoretical Problem
Consider the following formulation for extending the logistic regression to a multi-class classi cation setup-
P (Y = kjwk; (x)) =
ewkT (x)
K
wkT (x)
P
k=1 e
Here, each wk is a class speci c vector of dimension equal to number of features in (x). This formulation is called softmax. Note that this formulation is slightly di erent from the multi-class logistic mentioned in Lecture 9. Each of the weight vectors wk are computed by optimizing the categorical cross-entropy loss function.
N
K
1
X
kX
(i)
yk log(P (Y = kjwk; (x(i))))
E(W) = N
i=1
=1
Here y(i) is a K length one-hot vector representing the label of the ith example, and W is a matrix having wk as its rows. Note that W is a matrix of dimensions (num features x num classes).
(a) Show that cross entropy used to train a binary logistic regression (Eqn (3) of Lecture 9) is a
special case of this. (1 mark)
(b) In this task, you need to nd a vectorized expression for the gradient of E(W) with respect
to W. (3 marks) One approach to this problem is the following- rst assume that you have access to a matrix Z of dimension N K, such that-
• = (X)W
Doing this simpli es the expression for error to be in terms of Z. Then compute the gradient of E wrt Z (this will be of dimension N K, and can be computed by rst principles since E is a scalar), and Z wrt W. Finally, use the chain rule to get the required gradient.
2.2 CS 335: Lab Problems
(a) In this problem, we will try to predict the popularity of a song based on its properties. The dataset songs.csv contains information about numerous songs from 1990-2010, with the last
2
column indicating weather the song made it to Top 10 of the Billboard Hot 100 Chart(1 indi-cating it did, 0 otherwise). We will implement logistic regression for this binary classi cation problem. Perform the following tasks.
(i) Complete the function split data() in utils.py to split the data such that the train set consist of all the songs up to and including the year 2009 and test set consist of songs
that released in 2010. (0.5 marks)
(ii) Inside the function load data() in utils.py, discard columns(up to 5) which you think
won’t be relevant in predicting the output. (0.5 marks)
(iii) Complete the functions train() and predict() in the BinaryLogisticRegression
class in binary logistic regression.py. (2 marks)
Make sure you write an e cient vectorized implementation for each task. Note that since this is a binary classi cation task, you don’t need the softmax formulation described above. You are allowed to change learning rate lr and maximum iterations max iter.
(b) Report the test accuracy you obtained. Now, consider a model M which predicts 0 for any song. What accuracy does this model achieve on the test set? Do you think accuracy is a good
evaluation metric here? Brie y justify your answer. (1 marks)
(c) Consider a di erent evaluation metric F1 score, de ned as the harmonic mean of precision and recall. Precision is de ned as the fraction of positive outputs which are actually positive. Recall is de ned as the fraction of actually positive samples predicted as positive. Formally, let’s denote T P as true positives, F P as false positives and F N as false negatives. Then recall, precision and F1 score are de ned as follows:
recall =
T P
TP +FN
precision =
T P
TP +FP
F1
=
2 recall precision
recall + precision
Complete the function f1 score(). Report the F1 score you obtain for test set. Also, report the F1 score achieved by the model M described in the previous part. Do you think F1 is a
good evaluation metric for this task? Brie y justify your answer. (2 marks)
(d) Now, we will implement logistic regression for multi-class classi cation problem using the formulation described in section 2.1. We will use dataset data/D1 for this task. Com-plete the functions softmax(), train() and predict() in the LogisticRegression class in multiclass logistic regression.py. Use the vectorized expression you derived in prob-lem 2.1(b) to calculate the gradient of categorical cross-entropy loss function with respect to weights. Again, You are free to change the learning rate and maximum iterations. (3 marks)
(e) You have implemented both, logistic regression and perceptron for dataset D1. Which one
achieves more test accuracy? Why? Informally justify your answers. (1 marks)
3