Starting from:
$30

$24

Assignment 4 Solution

Instructions: Submit two files only: writeup.pdf and code.zip. Only question 3 and 4 are a programming questions; written answers are expected for all other questions. For all questions, show all your work, including intermediate steps.




1 K-Means and Hierarchical Clustering [20 pts]




(a) Hierarchical Clustering [10 pts]




The following table shows 6 major cities (single-letter abbreviations in bracket) and their Movehub city ranking (https://www.movehub.com/city-rankings/) based on the cost of cappuccino, cinema tickets, wine and gasoline.




City
Cappuccino
Cinema
Wine
Gasoline
Zurich (Z)
3.28
12.59
8.4
1.31
Hong Kong (H)
2.78
5.89
10.1
1.52
Toronto (T)
2.23
8.29
9.5
0.83
Lisbon (L)
1.02
5.11
2.98
1.38
Delhi (D)
0.84
2.41
7.24
0.84
Chicago (C)
2.45
7.85
7.85
0.71



Cluster the data use hierarchical clustering with single linkage by hand. Use the data as is without normali-zation. To begin, set up the similarity matrix using Euclidean distance. Show your steps for each iteration:




circle the closest pair of clusters in the similarity matrix, (2) show the updated the dendrogram, (3) show the updated similarity matrix. After the entire clustering procedure, produce TWO clusters using the final dendrogram. What are the two clusters and the distance between them?



(b) K-Means Clustering [10 pts]




Cluster the following 6 data points by hand using K-means clustering with k = 2:




x1 = 2; x2 = 4; x3 = 5; x4 = 12; x5 = 18; x6 = 20




Do the clustering twice, once starting from initial cluster centers c1 = 0 and c2 = 19, and the second time starting from initial cluster centers c1 = 18 and c2 = 19. Show your steps for each iteration, including (1) the points assigned to each cluster, (2) the updated cluster centers c1 and c2 and (3) energy at the end of that iteration. Energy is defined as the sum of squared distances between each data point and its assigned center, and in essence, the objective we are trying to minimize in K-means clustering. Which of the two k-means solutions is better? Why?




2 EM Algorithm [30 pts]




There are two biased coins C1 and C2, with probability p1 and p2 of coming up heads respectively. To figure out what p1 and p2 are, you collect some data by repeating the following experiment n times: pick a coin from the bag, toss it, observe the outcome (heads or tails) and return it to the bag. Assume that the coin tosses are independent, and that you choose C1 with probability , and C2 with probability 1 .




Suppose that the two coins have slightly different designs, so you can identify which coin is which. What is the maximum likelihood estimators for the two parameters, p1 and p2?



Suppose that the two coins have exactly the same design, so you cannot tell them apart (i.e., the identity of the coin is missing). Write out the expression for the expected log-likelihood of the data.



^
and p^2. Show the E-step and M-step of the
c) Suppose you start with some guesses for the parameters, , p^1
Expectation Maximization (EM) algorithm.























1
3 PCA and Eigenfaces [20 pts]




For this question, you are asked to create a program that uses PCA to reduce the dimensionality of face images for face recognition.




It is beneficial to use Matlab or Python for this question, since both provide standard packages for displaying images, computing eigenvectors, and machine learning, which you are welcome to use. For example, if you are using MATLAB, you can load the images using data = csvread(‘X train.csv’), display the i-th image using colormap(gray); imagesc(reshape(data(i,:),112,92)), and compute the k largest eigenvectors using eigs. In Python, you can load images using numpy.genfromtxt(‘X train.csv’,delimiter=‘,’), display the i-th image using matplotlib.pyplot.imshow(np.reshape(X train[i,:],(92,112)),cmap="gray"), and compute eigenvectors using scipy.sparse.linalg.eigs.







Download faces.zip from the course website. This dataset, from the ORL Database of Faces (https://www.cl. cam.ac.uk/research/dtg/attarchive/facedatabase.html), contains 400 face images, 10 different images for each of the 40 individuals taken under different lighting condition, with different facial expressions and details. The dataset is already split into a training set (40 individuals 9 images each) and test set (40 individuals 1 image each). In X train.csv and X test.csv, each row contains the features for a 112 92 face image. Y train.csv and Y test.csv contain the labels for the corresponding images.







(a) Visualizing Eigenfaces [5 pts]




Compute the mean of the 360 training faces, and subtract the mean from each training data point. Compute the covariance matrix for the 360 training faces, then find the 50 eigenvectors with the largest eigenvalues. Display the eigenvectors corresponding to the 10 largest eigenvalues as 112 92 images.




(b) Face Reconstruction [5 pts]




Project the data points into the subspace spanned by the 50 eigenvectors you computed. To do the projection, compute ui = xT Vi where Vi is the i-th eigenvector. To reconstruct the face images, compute x0 = Pi uiVi. For face images corresponding to lines 1, 5, 20, 30, 40 in X train.csv, display two images—original image and the reconstructed image—side by side. Include the screenshots in your writeup.







(c) Face Recognition [10 pts]




Use the projected images (i.e., 50-dimensional feature vectors) for face recognition. First, process the test data—subtract each data point in X test.csv by the mean you computed on X train.csv, and project each data point in X test.csv into the subspace spanned by the 50 eigenvectors you found previously. Then, apply 1-Nearest neighbor (1NN) classifier in this 50-dimensional space to predict labels for the faces in X test.csv. Feel free to use any standard machine learning packages for 1NN. Report your test classification accuracy.







4 Neural Network [30 pts]




For this question, you will experiment with fully connected neural networks and convolutional neural networks, using the Keras open source package. Keras is one of the simplest deep learning package that serves as a wrapper on top of TensorFlow, CNTK and Theano. Preliminary steps:




Download and install Keras from https://keras.io/. A CPU installation is sufficient for this assignment. Click on “Getting Started” and read the “Guide to the Sequential Model”.




Download the file cifar10 cnn.py from the example folder https://github.com/keras-team/keras/ tree/master/examples.







Answer the following questions by modifying the code in cifar10 cnn.py.







Dense vs Convolutional Neural Net [10 points]. Compare the accuracy of the convolutional neural network in the file cifar10 cnn.py on the cifar10 dataset to the accuracy of simple dense neural networks with 0, 1, 2, 3 and 4 hidden layers of 512 rectified linear units each. Modify the code in cifar10 cnn.py to obtain simple dense neural networks with 0, 1, 2, 3 and 4 hidden layers of 512 rectified linear units (with a dropout rate of 0.5). Produce a graph that contains 6 curves (one for the convolutional neural net and one for each dense neural net of 0-4 hidden layers). The y-axis is the test (validation) accuracy and the x-axis is the number of epochs (# of passes through the training set). Produce curves for the first 10 epochs. Although 10 epochs is not sufficient to reach convergence, it is sufficient to see the trend. Explain the results (i.e., why some models perform better or worse than other models). No need to submit your code since the modifications are simple.









2
Rectified Linear vs Sigmoid Units [10 pts]. Compare the accuracy achieved by rectified linear units and sigmoid units in the convolutional neural network in cifar10 cnn.py. Modify the code in cifar10 cnn.py to use sigmoid units. Produce a graph that contains 2 curves (one for rectified linear units and another one for sigmoid units). The y-axis is the test (validation) accuracy and the x-axis is the number of epochs (# of passes through the training set). Produce curves for the first 30 epochs. Explain the results (i.e., why did one model perform better than the other model). No need to submit your code since the modifications are simple.



Dropout and Data Augmentation [10 pts]. Compare the accuracy achieved with and without drop out as well as with and without data augmentation in the convolutional neural network in cifar10 cnn.py. Modify the code in cifar10 cnn.py to turn on and off dropout as well as data augmentation. In the fit generator function, set steps per epoch=len(x train)/batch size to ensure that the number of images for every epoch is the same. Produce two graphs (one for training accuracy and the other one for test accuracy) that each contain 4 curves (with and without dropout as well as with and without data augmentation). The y-axis is the accuracy (i.e., train or test/validation accuracy) and the x-axis is the number of epochs (# of passes through the training set). Produce curves for as many epochs as you can up to 100 epochs. Explain the results (i.e., why did some models perform better or worse than other models and are the results consistent with the theory). No marks will be deducted for doing less than 100 epochs, however make sure to explain what you expect to see in the curves as the number of epochs reaches 100. No need to submit your code since the modifications are simple.


































































































































































3

More products