Starting from:
$35

$29

CS559-B HW3

Problem 1 (25pt): [K-means] Implement the K-means algorithm. Note that you cannot directly call the built-in kmeans functions.

Figure 1: Scatter plot of datasets and the initialized centers of 3 clusters

Given the matrix X whose rows represent different data points, you are asked to perform a k-means clustering on this dataset using the Euclidean distance as the distance function. Here k is chosen as 3. The Euclidean distance d between a vector x and a vector y both in Rp is defined

as d = pi=1(xi − yi)2. All data in X were plotted in Figure 1. The centers of 3 clusters were

1

initialized as µ1 = (6.2, 3.2) (red), µ2 = (6.6, 3.7) (green), µ3 = (6.5, 3.0) (blue).

5.9 3.2

4.6 2.9

6.2 2.8

4.7 3.2

5.5 4.2

X =

5.0 3.0

4.9 3.1

6.7 3.1

5.1 3.8

6.0 3.0

(1) [10pt] What’s the center of the first cluster (red) after one iteration? (Answer in the format of [x1, x2], round results to three decimal places, same as part (2) and (3) )

(2) [5pt] What’s the center of the second cluster (green) after two iteration?

(3) [5pt] What’s the center of the third cluster (blue) when the clustering converges?

(4) [5pt] How many iterations are required for the clusters to converge?

Problem 2 (15pt): [Latent variable model and GMM] Consider the discrete latent variable model where the latent variable z use 1-of-K representation. The distribution for latent variable z is defined as:

p(zk = 1) = πk

where {πk} satisfy: 0 ≤ πk ≤ 1 and

K

πk

= 1. Suppose the conditional distribution of

k=1

observation x given particular value for z is Gaussian:

p(x|zk = 1) = N (x|µk, Σk)

(1) [5pt] Write down the compact form of p(z) and p(x|z).

(2) [5pt] Show that the marginal distribution p(x) has the following form:

K

p(x) = πkN (x|µk, Σk)

k=1

(3) [5pt] If we want to find the MLE solution for parameters πk, µk, Σk in such model, what algorithm should we use? Briefly describe its major difference compared to K-means algorithm.

Problem 3 (20pt): [Bayesian Network] Suppose we are given 5 random variables, A1, A2, B1, B2, B3. A1 and A2 are marginally independent and B1, B2, B3 are marginally dependent on A1, A2 as fol-

2

lows: B1 depends on A2, B2 depends on A2, A1 and B1, B3 depends on A1. All 5 random variables are binary, i.e., Ai, Bj ∈ {0, 1}, i = 1, 2; j = 1, 2, 3.

(1) [5pt] Draw the corresponding bayesian network.

(2) [5pt] Based on the bayesian network in (1), write down the joint distribution p(A1, A2, B1, B2, B3).

(3) [5pt] How many independent parameters are needed to fully specify the joint distribution

in (2).

(4) [5pt] Suppose we do not have any independence assumption, write down one possible fac-torization of p(A1, A2, B1, B2, B3), and state how many independent parameters are required to describe joint distribution in this case.

Problem 4 (40 pt) [Neural Networks]

Build a neural network with one hidden layer to predict class labels for Iris plant dataset (https:

//archive.ics.uci.edu/ml/datasets/iris).

(1) [30pt] Properly split the data to training and testing set, and report the training, testing accuracy. You can use sigmoid activation function and select any reasonable size of hidden units. Note that for this part, you need to implement the forward/backward function yourself without using the deep learning package. However, you can use the deep learning package, e.g., tensorflow, pytorch, matconvnet etc, to compare with your own results.

(2) [10pt] Try different design of the neural network, compare with part (1), and report findings. This is an open-ended question, you can change the previous model in several ways, e.g., (1) change the activation function to be tanh, ReLU etc, or (2) try to build more complex neural network by introducing more layers, or many other options. Note that for this part, you are allowed to use deep learning packages.

3

More products