$24
1 Classi cation with Linear Regression
Consider the following 1-dimensional input x = [ 2:0; 1:0; 0:5; 0:6; 5:0; 7:0] with corre-sponding binary class labels y = [0; 0; 1; 0; 1; 1]. Use (least-squares) linear regression, as shown in the lecture, to train on these samples and classify them. Your model should include an intercept term.
1. Provide the coe cients of the linear regression (on x and y) and explain shortly how you computed them.
2. Classify each of the 6 samples with your linear regression model. Explain how you map the continuous output of the linear model to a class label.
3. Discuss in your own words, why linear regression is not suitable for classi cation.
2 Log-likelihood gradient and Hessian
Consider a binary classi cation problem with data D = f(xi; yi)gni=1, xi 2 Rd and yi 2 f0; 1g. We de ne
f(x) = (x)> ; p(x) = (f(x)) ; (z) = 1=(1 + e z)
Lnll( ) =
n
i=1 hyi log p(xi) + (1 yi) log[1 p(xi)]i
X
where 2 Rd is a vector. (Note: p(x) is a short-hand for p(y = 1jx).)
1.
Compute the derivative
@
L( ). Tip: Use the fact that
@
(z) = (z)(1 (z)).
@z
@
2.
Compute the 2nd derivative
@2
L( ).
2
@
1
3 Discriminative Function in Logistic Regression
Logistic Regression de nes class probabilities as proportional to the exponential of a discriminative function:
exp f(x; y)
P (yjx) =
Py0 exp f(x; y0)
Prove that, in the binary classi cation case, you can assume f(x; 0) = 0 without loss of
generality.
This results in
exp f(x; 1)
P (y = 1jx) = 1 + exp f(x; 1) = (f(x; 1)):
(Hint: First assume f(x; y) = (x; y)> , and then de ne a new discriminative function f0 as a function of the old one, such that f0(x; 0) = 0 and for which P (yjx) maintains the same expressibility.)
2