Machine Learning Homework #2 Solution

Starting from:

~~$30~~

$24

Home

Question 1

(a) [True or False] If two sets of variables are jointly Gaussian, then the conditional distribution of one set conditioned on the other is again Gaussian. Similarly, the marginal distribution of either set is also Gaussian.

(b) We consider a partitioning of the components of x into three groups xa, xb, and xc, with a corresponding partitioning of the mean vector m and of the covariance matrix å in the form
m =
0 mb
1 ,
S =
0 Sba
Sbb
Sbc
1 .

ma
A

Saa
Sab
Sac
A

@ mc

@ Sca Scb Scc

Find an expression for the conditional distribution p(xajxb) in which xc has been marginalized out.

Question 2

Consider a joint distribution over the variable

z = y

x

whose mean and covariance are given by

E[z] =
m
L 1
L 1AT

Am+b ,
cov[z] = AL 1 L 1 + AL 1AT .
(a) Show that the marginal distribution p(x) is given by p(x) = N (xjm, L 1).
(b) Show that the1
conditional
distribution p(yjx)
is given by p(yjx) =
N (yjAx + b, L
).

1

Machine Learning (CS405) – Homework #2
2

Question 3

Show that the covariance matrix S that maximizes the log likelihood function is given by the sample covariance

1
N
SML =

å(xn mML)(xn mML)T.

N

Is the final result symmetric and positive definite (provided the sample covariance is nonsingular)?

Hints.

(a) To find the maximum likelihood solution for the covariance matrix of a multivari-ate Gaussian, we need to maximize the log likelihood function with respect to S. The log likelihood function is given by

ND

N
1
N
lnp(Xjm, S) =

ln(2p)

lnjSj

å(xn m)TS 1(xn m).

2

2

2

n=1

(b) The derivative of the inverse of a matrix can be expressed as

¶

1

1
¶A
1

(A

) = A

A

¶x

¶x

We have the following properties

¶
Tr(A) = I,
¶
lnjAj = (A
1)T.

¶A

¶A

Question 4

(a) Derive an expression for the sequential estimation of the variance of a univariate Gaussian distribution, by starting with the maximum likelihood expression

1
N
sML2
=

å(xn m)2.

N

Verify that substituting the expression for a Gaussian distribution into the Robbins-Monro sequential estimation formula gives a result of the same form, and hence obtain an expression for the corresponding coefficients aN.

(b) Derive an expression for the sequential estimation of the covariance of a multi-variate Gaussian distribution, by starting with the maximum likelihood expres-sion
1
N
SML =

å(xn mML)(xn mML)T.

N

Verify that substituting the expression for a Gaussian distribution into the Robbins-Monro sequential estimation formula gives a result of the same form, and hence obtain an expression for the corresponding coefficients aN.

Machine Learning (CS405) – Homework #2
3

Hints.

(a) Consider the result mML = N1 ånN=1 xn for the maximum likelihood estimator of the

mean mML, which we will denote by mML(N) when it is based on N observations. If we dissect out the contribution from the final data point xN, we obtain

(N)

1
N
1

1

N 1

1

N 1(N1)
mML
=

å xn =

xN
+

å xn =

xN
+

mML

N

N

N

N

N

n=1

n=1

= m
(N 1)
+
1

(xN
m
(N 1)).

ML

N

ML

(b) Robbins-Monro for maximum likelihood

¶

q(N) = q(N
1) + a(N 1)

lnp(xN jq
(N 1)).

¶q(N 1)

Question 5

Consider a D-dimensional Gaussian random variable x with distribution N(xjm, S) in which th4e covariance S is known and for which we wish to infer the mean m from a set of observations X = fx1, x2, ......, xN g. Given a prior distribution p(m) =

N(mjm0, S0), find the corresponding posterior distribution p(mjX).

You should download the HW2_programQuestion.ipynbfile first.
4
Machine Learning (CS405) – Homework #2

Program

In this coding exercise, we will implement the K-nearest Neighbors (KNN) algo-rithm. You are provided with a Jupyter Notebook in which you will have to fill in the functions as instructed therein. Be sure to read the notebook thoroughly for the instructions and also comment your code appropriately.

This is a classification problem and we will use the Breast Cancer dataset. This dataset is included as part of sklearn. We have loaded the dataset for you in the Jupyter notebook. Please familiarize yourself with the dataset first.

Table1: Accuracy for the KNN classification problem on the validation set

A training data (X train) is provided which has several datapoints, and each data-point is a p-dimensional vector (i.e., p features). You are also provided with separate validation (X val) and test sets (X test). Your task is to implement the K-nearest neigh-bors algorithm to determine the ideal combination of the value of K and the metric norm. For this you have to play with different combinations of metric norm and different values of K.

Compute the accuracy for the X val set for every combination of K and metric norm. Once, you have decided the ideal value of K and a relevant metric norm using the validation set, use those values to report the accuracy for the test set X test. You have to use the following values of K = 3, 5 and 7. The different metrics norms to be implemented are: L1, L2 and L-inf. Do not use any library to implement the norms.

(a) How could having a larger dataset influence the performance of KNN?

(b) Tabulate your results in Table 1 for the validation set as shown below and include that in your file.

(c) Finally, mention the best K and the norm combination you have settled upon from the above table and report the accuracy on the test set using that combina-tion.

(d) The Autograder for your code submission will grade on the correctness of your implementation for the following functions as given in the Jupyter notebook: dis-tanceFunc, computeDistancesNeighbors, Majority and KNN.

Reference. The dataset and question are from Kaggle and University of Pennsylvania.