- (20
points) Let A = U V T
be the SVD of A, where A 2 Rm
n,
U 2 Rm
m
and V 2 Rn
n
are orthogonal matrices, = diag( 1;
; r;
0; ; 0), and r = rank(A). Show that
- The
rst r columns of U are eigenvectors of AAT
corresponding to nonzero eigenvalues.
-
The
rst r columns of V are eigenvectors of AT
A corresponding to nonzero eigenvalues.
- (10
points) Given a symmetric matrix A 2 R3
3,
suppose its eigen-decomposition can be
-
What
is the singular value decomposition of this matrix?
- (20
points) Given a data matrix X = [x1;
x2;
; xn]
2 Rp
n
consisting of n data points, and each data point is p-dimensional,
Outline
the procedure for computing the PCA of X;
State
what is the \minimum reconstruction error" property of PCA.
Prove
the minimum reconstruction error property of PCA by using the best
low-rank approximation property of SVD.
-
(20
points) Hierarchical clustering: Use the similarity matrix in Table
1 to perform single (MIN) and complete (MAX) link hierarchical
clustering. Show your results by drawing a dendrogram. The
dendrogram should clearly show the order in which the points are
merged.
-
(30
points) Principal Component Analysis: In this homework, you will
apply the principal component analysis to a collection of
handwritten digit images from the USPS dataset. The USPS dataset is
in the \data" folder: USPS.mat. The starting code is in the
\code" folder. The whole data has already been loaded into the
matrix A. The matrix A has shape 3000 256 and contains all the
images. Each row in A corresponds to a handwritten digit image
(between 0 and 9) with size 16 16. You are expected to implement
your solution based on the given codes. The only le you need to
modify is the \solution.py" le. You can test your solution by
running the \main.py" le.
1
Table 1:
Similarity matrix.
-
p1
p2
p3
p4
p5
- (10
points) Complete the pca() function. Your code will be tested on p =
10, 50, 100, 200, total four di erent number of the principal
components.
- (5
points) Complete the reconstruction() function to reconstruct the
data using the selected principal components from (a).
- (5
points) Complete the reconstruct error() function to measuring the
reconstruction error.
-
(10
points) Run \main.py" to see the reconstruction results and
summarize your obser-vations from the results into a short report.
When you run the \main.py" le, a subset (the rst two) of the
reconstructed images based on p = 10, 50, 100, 200 principal
com-ponents will be automatically saved on the \code" folder.
Please attach these images into your report also.
Deliverable:
You should submit (1) a hard-copy report (along with your write-up
for other questions) that summarizes your results and (2) the
\solution.py" le to the Blackboard.
Note:You
are NOT supposed to use existing PCA code; instead, you should write
your own PCA function. Please read the \Readme.txt" le carefully
before you start this assignment.