Starting from:
$30

$24

Homework #6 Solution

(20 points) Let A = U V T be the SVD of A, where A 2 Rm n, U 2 Rm m and V 2 Rn n are orthogonal matrices, = diag( 1; ; r; 0; ; 0), and r = rank(A). Show that



The rst r columns of U are eigenvectors of AAT corresponding to nonzero eigenvalues.
The rst r columns of V are eigenvectors of AT A corresponding to nonzero eigenvalues.



(10 points) Given a symmetric matrix A 2 R3 3, suppose its eigen-decomposition can be
written as

A =
0
u21
u22
u23
1 0
0
2
0
1 0
u12
u22
u32
1
:
(1)




u11
u12
u13


3
0
0


u11
u21
u31
A






@ u31
u32
u33
A @ 0
0
1 A @ u13
u23
u33




What is the singular value decomposition of this matrix?




(20 points) Given a data matrix X = [x1; x2; ; xn] 2 Rp n consisting of n data points, and each data point is p-dimensional,



Outline the procedure for computing the PCA of X;




State what is the \minimum reconstruction error" property of PCA.




Prove the minimum reconstruction error property of PCA by using the best low-rank approximation property of SVD.




(20 points) Hierarchical clustering: Use the similarity matrix in Table 1 to perform single (MIN) and complete (MAX) link hierarchical clustering. Show your results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged.



(30 points) Principal Component Analysis: In this homework, you will apply the principal component analysis to a collection of handwritten digit images from the USPS dataset. The USPS dataset is in the \data" folder: USPS.mat. The starting code is in the \code" folder. The whole data has already been loaded into the matrix A. The matrix A has shape 3000 256 and contains all the images. Each row in A corresponds to a handwritten digit image (between 0 and 9) with size 16 16. You are expected to implement your solution based on the given codes. The only le you need to modify is the \solution.py" le. You can test your solution by running the \main.py" le.



1



Table 1: Similarity matrix.



p1
p2
p3
p4
p5












p1
1.00
0.10
0.41
0.55
0.35
p2
0.10
1.00
0.64
0.47
0.98
p3
0.41
0.64
1.00
0.44
0.85
p4
0.55
0.47
0.44
1.00
0.76
p5
0.35
0.98
0.85
0.76
1.00















(10 points) Complete the pca() function. Your code will be tested on p = 10, 50, 100, 200, total four di erent number of the principal components.



(5 points) Complete the reconstruction() function to reconstruct the data using the selected principal components from (a).



(5 points) Complete the reconstruct error() function to measuring the reconstruction error.



(10 points) Run \main.py" to see the reconstruction results and summarize your obser-vations from the results into a short report. When you run the \main.py" le, a subset (the rst two) of the reconstructed images based on p = 10, 50, 100, 200 principal com-ponents will be automatically saved on the \code" folder. Please attach these images into your report also.






Deliverable: You should submit (1) a hard-copy report (along with your write-up for other questions) that summarizes your results and (2) the \solution.py" le to the Blackboard.




Note:You are NOT supposed to use existing PCA code; instead, you should write your own PCA function. Please read the \Readme.txt" le carefully before you start this assignment.






























































































2

More products