Starting from:
$30

$24

Homework 2 Solution

Version 1







Instructions.




Homework is due Tuesday, February 26, at 11:59pm; no late homework accepted. You will have everything you need to solve these problems after both deep network lectures.




Everyone must submit individually at gradescope under hw2 and hw2code.




The \written" submission at hw2 must be typed, and submitted in any format gradescope accepts (to be safe, submit a PDF). You may use LATEX, markdown, google docs, MS word, whatever you like; but it must be typed!




When submitting at hw2, gradescope will ask you to mark out boxes around each of your answers; please do this precisely!




Please make sure your NetID is clear and large on the rst page of the homework.




Your solution must be written in your own words. Please see the course webpage for full academic integrity information. Brie y, you may have high-level discussions with at most 3 classmates, whose NetIDs you should place on the rst page of your solutions, and you should cite any external reference you use; despite all this, your solution must be written in your own words.




We reserve the right to reduce the auto-graded score for hw2code if we detect funny business (e.g., rather than implementing an algorithm, you keep re-submitting the assignment to the auto-grader, eventually completing a binary search for the answers).




There are no regrade requests on hw2code, which is the code auto-grader; however, you can re-submit and re-grade as many times as you like before the deadline! Start early and report any issues on piazza!




Methods and functions in the template and utility code include docstrings to describe the inputs and outputs. The autograder relies on correct implementations of these methods. Follow the docstrings to avoid failing tests.






Singular Value Decomposition and norms.



Given a matrix A 2 Rn d, de ne its spectral norm kAk2 as



kAk2 := max kAxk2:

x2Rd;kxk2 1




Prove that kAk2 is equal to the largest singular value of A. You may assume A is not the 0 matrix.




Hint: write A in terms of its SVD, and rewrite x in one of the bases provided by the SVD.




Prove that for any matrix A 2 Rn d and vector x 2 Rd, then kAxk2 kAk2 kxk2.



Given a matrix A 2 Rn d, de ne its Frobenius norm kAk2 as
A F =
v


:
Aij2
k k
u
n d


i=1 j=1




uXX


t




(This is the same as the vector k k2 norm of the unrolled/ attened/vectorized matrix.) Prove that kAk2F = tr(ATA), where tr denotes the trace.

(d) Continuing with the previous part, now show




A F =
v


;
si2
k k
u
r


i=1




uX


t




the (vector) ‘2 norm of the singular values of A (or 0 if there are none).




Hint: the previous part gives a few convenient ways to do this problem after you replace A with its SVD.




(e) Given matrices A 2 Rn d and B 2 Rd m, prove




kABkF kAk2 kBkF:




Hint: note that kBk2F = Pmi=1 kB:;ik22, where B:;i denotes the ith column of B.

(f) Suppose A 2 Rn d has rank r 1. Prove




kA+Ak2 = kAA+k2 = 1




and
pr:
kA+AkF = kAA+kF =



Choose a matrix A 2 Rn n with rank 1 r < n, where n 2, and a vector v 2 Rn so that A+Av = 0 but AA+v 6= 0.
Please use only 1-2 sentences for your answer.




Hint: one convenient way to solve and then state your answer is to de ne/construct A via its SVD.




Remark: the point of this part is that inverses and pseudoinverses really can behave di erently!










Solution. (Your solution here.)


Singular Value Decomposition and image reconstruction.
de ne the min-k-P
r
T
minfk;rg
T


i i
i








P


i=maxf1;r k+1g






(a) Say that if A =
i=1 siuivi , then
i=1
siuivi
is the k-reconstruction of A. Similarly, we
reconstruction to be
P
r




s u v
T
. Implement reconstruct


SVD, as










per the docstrings.




















(b) Add in your writeup a log-scale plot of the singular values of the red color channel: Let A denote the red color channel and A = Pr siuivT denote its singular value decomposition, for

i=1 i




each 1 i r, plot ln(1 + si). Comment on the distribution of the singular values. Is the log-singular-value plot linear?




Use the function get img to get an old NASA Astronomy Picture of the Day. Include in your report plots of



The original image



The 100-reconstruction of the image



The 50-reconstruction of the image



The 20-reconstruction of the image



The 10-reconstruction of the image



The min-600-reconstruction of the image












Solution. (Your solution here.)




Neural Networks on XOR.



In this problem you will demonstrate that a two-layer neural network with the ReLU activation function can classify the XOR dataset correctly. This will also serve as an introduction to writing your own neural networks in PyTorch! Consider the two layer neural network below




x 7!W 2 1(W 1x + b1) + b2:




(a) Implement your network in the class XORNet. You will need to modify init , set l1, set l2, and forward methods. The setter methods are used by the autograder to test your network implementation. Note: to maintain consistency with PyTorch’s torch.nn.Linear, the arguments







to set l1, set l2 will have shapes consistent with the following network formulation: f(X) = 1(XW T1 + b1)W T2 + b2, where X 2 Rn d.




De ne the fit function. Please refer to the docstring and template code for details.



Using your fit function, train an XORNet on the XOR dataset for 5000 epochs, and then use contour torch to plot your resulting network. Include the plot in your writeup. Did you successfully classify the XOR points, or did your gradient descent get stuck in a local minima of the loss function?












Solution. (Your solution here.)





Convolutional Neural Networks.



In this problem, you will use convolutional neural networks to learn to classify handwritten digits. The digits will be encoded as 8x8 matrices. The layers of your neural network should be:




A 2D convolutional layer with 1 input channel and 8 output channels, with a kernel size of 3 A 2D maximimum pooling layer, with kernel size 2




A 2D convolutional layer with 8 input channels and 4 output channels, with a kernel size of 3 A fully connected (torch.nn.Linear) layer with 4 inputs and 10 outputs




Apply the ReLU activation function to the output of each of your convolutional layers before inputting them to your next layer. For both of the convolutional layers of the network, use the default settings parameters (stride=1, padding=0, dilation=1, groups=1, bias=True).




Implement the class DigitsConvNet. Please refer to the dosctrings in hw2.py for details.



Implement fit and validate for use in the next several parts. Please do not shu e the inputs when batching in this part! The utility function loss batch will be useful. See the docstrings in hw2.py and hw2 util.py or details.



Fit a DigitsConvNet on the train dataset from torch digits. Use CrossEntropyLoss, an SGD optimizer with learning rate 0.005 and no momentum, and train your model for 30 epochs with batch size of 1. Keep track of your training and validation loss for the next part.



Fit another DigitsConvNet. This time we will adjust the learning rate so that it decreases at each epoch. Recall the gradient descent update step






wi+1 = wi trwF (wi):




Here, i is the step, and t is the epoch. We will update the learning rate at each epoch so t+1 = t. You should use torch.optim.lr scheduler.ExponentialLR. We will use the a decay rate of = 0:95, and start the learning rate at 0.005. Save your neural network to a le using torch.save(model.cpu().state dict(), "conv.pb"). You will submit this le to the autograder.







Fit a third DigitsConvNet, again with an SGD optimizer with learning rate 0.005 and no momentum. However, this time, use a batch size of 16. Plot the epochs vs loss for parts (b), (c), and (d). Include the plot and your assessment of the impact of the exponentially decayed learning rate on training speed. Additionally, comment on the impact of increasing batch size. (In your report, may simply leave parts (c) and (d) blank, and include all comments in part (e).)



The last layer of the network can be interpreted as a linear classi er on top of a feature representation constructed by the earlier layers. The purpose of this sub-question is to assess the quality of these features. Implement the method intermediate, as speci ed in the docstring. Then, use the function plot PCA (included in the hw2 utils.py) to make a scatterplot of your the intermediate representation of the training data. (You will learn about PCA later in the course, but for now, it is su cient to know it can be used for dimensionality reduction.) Include your plot in the writeup. Use your best neural net (according to validation accuracy) for this plot.



Use the feature representation of your training dataset and labels to create a scipy.spatial.KDTree. Then, do 5-nearest-neighbors classi cation of the the feature representation of your validation dataset with the KDTree (this uses the KDTree’s query method). You should use the same neural net as in part (f) for this problem. Report the accuracy of your KDTree on the validation set.












Solution. (Your solution here.)












5

More products