Assignment 4: Logistic Regression

Starting from:

~~$30~~

$24

Home

1 Classi cation with Linear Regression

Consider the following 1-dimensional input x = [ 2:0; 1:0; 0:5; 0:6; 5:0; 7:0] with corre-sponding binary class labels y = [0; 0; 1; 0; 1; 1]. Use (least-squares) linear regression, as shown in the lecture, to train on these samples and classify them. Your model should include an intercept term.

1. Provide the coe cients of the linear regression (on x and y) and explain shortly how you computed them.

2. Classify each of the 6 samples with your linear regression model. Explain how you map the continuous output of the linear model to a class label.

3. Discuss in your own words, why linear regression is not suitable for classi cation.

2 Log-likelihood gradient and Hessian

Consider a binary classi cation problem with data D = f(xi; yi)gni=1, xi 2 Rd and yi 2 f0; 1g. We de ne
f(x) = (x)> ; p(x) = (f(x)) ; (z) = 1=(1 + e z)

Lnll( ) =
n

i=1 hyi log p(xi) + (1 yi) log[1 p(xi)]i

X

where 2 Rd is a vector. (Note: p(x) is a short-hand for p(y = 1jx).)
1.
Compute the derivative
@
L( ). Tip: Use the fact that
@
(z) = (z)(1 (z)).

@z

@

2.
Compute the 2nd derivative
@2
L( ).

2

@

1

3 Discriminative Function in Logistic Regression

Logistic Regression de nes class probabilities as proportional to the exponential of a discriminative function:
exp f(x; y)
P (yjx) =

Py0 exp f(x; y0)
Prove that, in the binary classi cation case, you can assume f(x; 0) = 0 without loss of

generality.
This results in
exp f(x; 1)
P (y = 1jx) = 1 + exp f(x; 1) = (f(x; 1)):

(Hint: First assume f(x; y) = (x; y)> , and then de ne a new discriminative function f0 as a function of the old one, such that f0(x; 0) = 0 and for which P (yjx) maintains the same expressibility.)

2