$24
This homework contains 2 questions. The last question requires programming. The maximum number of points is 100 plus 20 bonus points.
PCA via Successive Deflation [30 points]
(Adapted from Murphy Exercise 12.7)
Suppose we have a set of n data points x1; : : : ; xn, where each xi is represented as a d-dimensional
column vector. Assume that the data has been centerlized, i.e., having zero mean:
1
in=1 xi
= 0. Let
n
X = [x
1
; : : : ; x
n
]
be the
(d
n) matrix where column i is equal to x
. Define C =
1
XXT
to be the
1
n
i
Pn
covariance matrix of X, where c
ij =
Pl=1 xilxjl
= covar(i; j).
n
Next, order the eigenvectors of C by their eigenvalues (largest first), and let v1; v2; : : : ; vk be the first k eigenvectors. These satisfy
(
viT vj = 0 if i 6= j
1 if i = j
v1 is the first principal eigenvector of C (the eigenvector with the largest eigenvalue), and as such satisfies
Cv1 = 1v1. Now define x~i as the orthogonal projection of xi onto the space orthogonal to v1:
x~i = (I v1v1T )xi
~
; : : : ; x~n] as the deflated matrix of rank d 1, which is obtained by removing
Finally, define X = [x~1
from the d-dimensional data the component that lies in the direction of the first principal eigenvector:
~
T
X=(I
v1v1 )X
[7 points] Show that the covariance of the deflated matrix,
1~~T C = nXX
is given by
1
~
T
T
C =
XX
1v1v1
n
(Hint: Some useful facts: (I v1v1T ) is symmetric, XXT v1 = n 1v1, and v1T v1 = 1. Also, for any matrices A and B, (AB)T = BT AT .)
2. [7 points] Show that for j 6= 1, if vj is a principal eigenvector of C with corresponding eigenvalue j
~
(that is, Cvj = j vj ), then vj is also a principal eigenvector of C with the same eigenvalue j .
~
3. [8 points] Let u be the first principal eigenvector of C. Explain why u = v2. (You may assume u is unit norm.)
4. [8 points] Suppose we have a simple method f for finding the leading eigenvector and eigenvalue of a positive-definite matrix, denoted by [ ; u] = f(C). Write some pseudocode for finding the first k principal basis vectors of X that only uses the special f function and simple vector arithmetic.
(Hint: This should be a simple iterative routine that takes only a few lines to write. The input is C; k; and the function f, the output should be vj and j for j 2 1; ; k)
1
Question 2 – Action recognition with CNN (70 points+20 bonus)
In this question, you will train a convolutional neural network (CNN) to classify images and videos using Pytorch. We use the UCF101 data (see http://crcv.ucf.edu/data/UCF101.php). There are also 10 classes of data in this homework but the data and the number of classes are different from those of Homework 4. Each clip has 3 frames and each frame is 64 64 pixels. The labels of train and validation clips are provided in hw6 data:mat.
You will first train a CNN for action classification for each image. You will then improve the network architecture and submit the classification results on the test data to Kaggle. Then, you will train a CNN using 3D convolution for a set of video frames (rather than for individual frames), and submit your results to Kaggle.
The detail instructions and questions are in the jupyter notebook Action CN N:ipynb. In this file, there are 8 ‘ToDos’ spots for you to fill. The score of each ToDo is specified at the spot. For the 5th and 8th ToDos, you need to submit CSV result files to Kaggle. The results would be evaluated by Categorization Accuracy.For the 5th ToDo, submit to https://www.kaggle.com/c/cse512f18hw6img. For the 8th ToDo, submit to https://www.kaggle.com/c/cse512f18hw6vid.
We will maintain a leader board for each Kaggle competition, and the top three entries at the end of the competition (official assignment due date) will receive 10 bonus points. Any submission that rises to top three after the assignment deadline is not eligible for bonus points. The ranking will be based on the Categorization Accuracy. Marks for these questions will be scaled according to the ranking on the Private Leaderboard. To prevent exploiting test data, you are allowed to make a maximum of 2 submissions per 24 hours. Your submission will be evaluated immediately and the leader board will be updated.
Environment setting
Please make a :=data folder under the same directory with the Action CN N:ipynb file. Put data :=trainClips, :=valClips, :=testClips and hw6 data:mat under :=data.
We recommend using virtual environment for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following in the command-line interface:
cd your_hw6_folder
sudo pip install virtualenv
# This may
already
be installed
virtualenv .env
# Create a
virtual
environment
source .env/bin/activate
#
Activate
the virtual environment
pip install -r requirements.txt
#
Install dependencies
Note that this does NOT install TensorFlow or PyTorch,
which you need to do yourself.
Work (hard) on the assignment
... and when you’re done:
deactivate # Exit the virtual environment
Note that every time you want to work on the assignment, you should run ‘source .env/bin/activate’ (from within your hw6 folder) to re-activate the virtual environment, and deactivate again whenever you are done.
What to submit?
3.1 Blackboard submission
You will need to submit both your code and your answers to questions on Blackboard. Put the answer file and your code in a folder named: SBUID FirstName LastName (e.g., 10947XXXX lionel messi). Zip this folder and submit the zip file on Blackboard. Your submission must be a zip file, i.e, SBUID FirstName LastName.zip.
2
The answer file should be named: answers.pdf. The first page of the answers.pdf should be the filled cover page at the end of this homework. The remaining of the answer file should contain:
1. Answers (and derivations) to Question 1
You can use Latex if you wish, but it is not compulsory.
3.2 Kaggle submission
For Question 2, you must submit a CSV file to get the accuracy from the competition sites, mentioned above, A submission file should contain two columns: ID and Class. The file should contain a header and have the following format.
Id; Class
42; 2
43; 5
::: :::
Two sample submission files are available from the competition site and our handout.
Cheating warnings
Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You cannot ask and discuss with students from previous years. You cannot look up the solution online.
3
Cover page for answers.pdf
Your Name:
Solar ID:
NetID email address:
Names of people whom you discussed the homework with: