$24
Theory Questions
1.1 Vector Calculus
Let f : RD ! R. Recall that the gradient of f is a (column) vector of length D whose d-th component is the
derivative of f(x) with respect to xd, . The Hessian is the D D matrix whose entry (i; j) is the second
derivative of f(x) with respect to xi and xj, @2f(x) .
@xi@xj
Let f : RD ! R be the function f(x) = xAx + bx + c, where A is a D D matrix, b is a vector of length D and c is a constant.
Determine the gradient of f, rf(x).
Determine the Hessian of f, r2f(x).
1.2 Maximum Likelihood Principle
Assume we are given i.i.d. samples X1; ; XN 2 R drawn from a Gaussian distribution with mean and variance 2. We do not know the two parameters ; , and want to estimate them from the data using the maximum likelihood principle.
Write down the likelihood for this data, i.e., the joint distribution P ; 2 (X1; ; X2), where the subscripts and 2 remind us that this distribution depends on these two parameters.
Use the maximum likelihood principle to estimate the two parameters and 2.
More precisely, take the gradient of the joint distribution with respect to the two parameters and set it to 0. Then solve the two equations for and 2. If you do not know some quantity in the resulting expression, replace it with its estimate. This gives you two estimators for the two parameters as a function of the data, which we call ^(X1; ; XN ) and ^2(X1; ; XN ).
Compute E[^]. Is this equal to the true parameter ?
Compute E[ ^2]. Is this equal to the true parameter 2?
Implementing K-Means
Goals. The goal of this exercise is to
Implement and visualize K-means clustering using the faithful dataset. Visualize the behavior with respect to the number of clusters K.
Implement data compression using K-means.
Setup, data and sample code. Obtain the folder labs/ex08 of the course github repository
github.com/epfml/ML course
We will use the dataset faithful.csv in this exercise, and we have provided sample code templates that already contain useful snippets of code required for this exercise.
We will reproduce Figure 9.1 of Bishop's book.
Exercise 2a):
Let's rst implement K-means algorithm using the faithful dataset.
Fill-in the code to initialize the cluster centers.
Write the function kmeansUpdate to update the assignments z, the means , and the distance of data points to the means. Your code should work for any number of clusters K (not just K = 2).
Write code to test for convergence.
Visualize the output. You should get gures similar to Figure 1.
(a) Iteration 0 (b) Iteration 1
(c) Iteration 2 (d) Iteration 3
Figure 1: K-means for faithful data.
Exercise 2b):
Now, play with the initial conditions and the number of clusters to understand the behavior of K-means.
Change the initial conditions and observe the change in convergence. The algorithm must converge for all possible initial conditions, otherwise there is a problem in your implementation.
Try di erent values for K. Also try di erent values of initial condition. Look at the cost function value as K increases.
BONUS: What is a good value for K? How will you choose it?
Data Compression using K-Means
We will implement data compression using K-means, similar to the examples shown in the class.
Exercise 3:
Write data compression for mandrill.png.
Your output should look like Figure 2.
Run K-means with random initializations and observe the convergence. Plot the reconstructed image by setting each pixel's value to the mean value of its cluster. Play with the number of clusters and compare the compression you get in your resulting image.
Figure 2: Image quantization / compression using K-means.
3