$8.99
1. [15 points] Binary Classifiers
(a) In order to use a linear regression model for binary classification, how do we map the regression output w x to the class labels y ∈ {−1, 1}?
Your answer:
1+e−a
(b) In logistic regression, the activation function g(a) = 1 is called sigmoid. Then how do we map the sigmoid output g(w x) to binary class labels y ∈ {−1, 1}?
Your answer:
∂a
(c) Is it possible to write the derivative of the sigmoid function g w.r.t a, i.e. ∂g , as a simple
function of itself g? If so, how?
Your answer:
(d) Assume quadratic loss is used in the logistic regression together with the sigmoid func- tion. Then the program becomes:
min f (w) :=
w
1 X
2
2
yi − g(w xi )
i
where y ∈ {0, 1}. To solve it by gradient descent, what would be the w update equation?
Your answer:
(e) Assume y ∈ {−1, 1}. Consider the following program for logistic regression:
−
min f (w) := X log 1 + exp( y(i)wT φ(x(i))) .
w
i
The above program for binary classification makes an assumption on the samples/data points. What is the assumption?
Your answer:
2