Starting from:
$30

$24

Deep Learning Homework 1 Solution




Instructions




This homework is Due February 13th at 11.59pm. Late submission policies apply. You will submit a write-up and your code for this homework.




Submission instructions will be updated soon.




[50 points] Neural network layer implementation.



In this problem, you will implement various layers. X; Y will represent the input and the output of the layers, respectively. L is a scalar valued function.




For each layer, in addition to the following computations, implement the corresponding forward and backward passes in layers.py.




Fully-connected layer



Let X 2 RDin . Consider a dense layer with parameters W 2 RDin Dout ; b 2 RDout . The layer outputs a vector Y 2 RDout where Y is given by W T X + b.

Compute the partial derivatives @W@L ; @L@b ; @X@L in terms of @Y@L .




ReLU



Let X be a tensor and Y = ReLU(X). Express @X@L in terms of @Y@L .




Dropout



Given a dropout mask M, let Y = X M, where represents element-wise multiplication. Express @X@L in terms of @Y@L .




Note: For the forward pass implementation, you will have to consider train and test scenarios.




Batch Normalization Let X be a 2D tensor of input instances where Xi represents the ith



example vector. The output of the layer Y is given by Yi = ( Xi ) + where = n1 Pj Xj




q

and =
1
Pj(Xj
)2 + . and are constants that represent a running average of the
n
sample mean and variance, respectively. Note that ; ; ; are all vectors.

Derive expressions for @ and @ .




@Xi @Xi

Based on this, derive an expression for @L in terms of @L .




@Xi @Y




Note: For the forward pass implementation, you will have to consider train and test scenarios.







1









Convolution



Note: In this question, any indices involved (i; j, etc.) start from 1, and not 0.




Given 2-d tensors a 2 RH W and b 2 RH0 W 0 , we de ne the valid convolution and full convolution operations as follows,

(a valid b)i;j =
am;nbi
m+H0 ;j
n+W 0
m;n






X






(a full b)i;j =
am;nbi
m+1;j
n+1
m;n




X




The convolution operation we discussed in class is valid convolution, and does not involve any




zero padding. This operation produces an output of size (H H0 + 1)


(W W0 + 1).


Full convolution can be thought of as zero padding a on all sides with width and height one less

than the size of the kernel (i.e. H0 1 vertically and W 0
1 horizontally) and then performing
valid convolution. This operation produces an output of size (H + H0
1)


(W +W0 1)


(Verify this).










It is also useful to consider the ltering operation lt, de ned by






(a lt b)i;j =
X








ai+m 1;j+n
1bm;n






m;n




The ltering operation is similar to the valid convolution, except that the lter is not ipped when computing the weighted sum.




You will implement a valid convolution layer for this question. Assume the input to the layer is given by X 2 RN C H W . Consider a convolutional kernel W 2 R F C H0 W 0 . The output of the valid convolution layer is given by Y 2 RN F H00 W 00 . The layer produces F output




























































feature maps de ned by Yn;f = c Xn;c valid Wf;c, where Wf;c
represents the ipped kernel
c












































P
n;c valid






Xn;c lt Wf;c.






). Note that Y


=
W


=














K


0 +1 j
n;f
X
f;c
(i.e., Ki;j is de ned as Ki;j = PH0 +1 i;W




















c




P






















































Show that
X












@L




























@L












































Wf;c full








































= f




























@Xn;c


@Yn;f
















and












@L
X












@L








































































Xn;c lt








































= n






























@Wf;c
@Yn;f














To anwser question 2-5, please read through solver.py and familiarize yourself with the API. To build the models, please use the intermediate layers you implemented in question 1. After doing so, use a Solver instance to train the models.




For questions 2 and 3, use the data le data.pkl provided in Canvas for training and evaluating models. The data is provided as a tuple (Input features, Targets). Use the rst 500 instances for training and 250 instances each for validation and testing, respectively.













2









[8 points] Logistic regression and beyond: Binary classi - cation with a sigmoid output layer



Implement the logistic loss layer logistic_loss in layers.py.



Implement a logistic classi er using the starter code provided in logistic.py.



Train a simple logistic classi er (with no hidden units). Report the test accuracy for the best model you identi ed.



Train a 2 layer neural network with logistic regression as the output layer. Identify and report an appropriate number of hidden units based on the validation set. Report the test accuracy for your best model.



[8 points] SVM and beyond: binary classi cation with a hinge-loss output layer



Implement the hinge-loss layer svm_loss in layers.py.



Implement a binary SVM classi er using the starter code provided in svm.py.



Train a binary SVM classi er (with no hidden units). Report the test accuracy for the best model you identi ed.



Train a 2 layer neural network with hinge-loss as the output layer. Identify and report an appropriate number of hidden units based on the validation set. Report the best test accuracy for your best model.



[8 points] Softmax regression and beyond: multi-class clas-si cation with a softmax output layer



Implement softmax loss layer softmax_loss in layers.py



Implement softmax multi-class classi cation using the starter code provided in softmax.py.



Train a softmax multi-class classi er (with no hidden units) on MNIST dataset. Report the test accuracy.



Train a 2 layer neural network with softmax-loss as the output layer on MNIST dataset. Identify and report an appropriate number of hidden units based on the validation set (you can leave 1/10 data samples in the train dataset as validation dataset). Report the best test accuracy for your best model.



[8 points] Convolutional Neural Network for multi-class classi cation



Implement the forward and backward passes of max-pooling layer in layers.py






3









Implement CNN for softmax multi-class classi cation using the starter code provided in cnn.py.



Train the CNN multi-class classi er on MNIST dataset. Identify and report an appropriate lter size of convolutional layer and appropriate number of hidden units in fully-connected layer based on the validation set (you can leave 1/10 data samples in the train dataset as validation dataset). Report the best test accuracy for your best model.



Train the CNN multi-class classi er with dropout and batch normalization on MNIST dataset. Identify and report an appropriate architecture and appropriate rate of dropout based on the validation set (you can leave 1/10 data samples in the train dataset as validation dataset). Report the best test accuracy for your best model.



[12 points] Convolutional Neural Networks for CIFAR10 image classi cation.



You will implement and train VGG11 model in this question using vgg.py. In this question, you can use the layer in PyTorch.




Implement the VGGNet in VGG class and train the model in train function. Please follow the description about architecture in comments to build the network.



Test the model in test function and report the classi cation accuracy on the test set.



You are encouraged to try di erent optimizer, image preprocessing, activation function, con-volutional feature size, learning rate strategy, architecture of the network to make the classi - cation accuracy better on test set.



[6 points] Short answer questions



Describe the main di culty in training deep neural networks with the sigmoid non-linearity.



Consider a linear layer W . During training, dropout with probability p is applied to the layer input x and the output is given by y = W (x m) where m represents the dropout mask. How would you use this layer during inference to reduce the train/test mismatch (i.e., what is the input/output relationship).



If the training loss goes down quickly and then diverges during training, what hyperparameters would you modify ?













































4

More products