$24
[15 points] Transfer learning
In this problem, you will run experiments for two major transfer learning scenarios in transfer_learning.py.
Fill in the blank in train_model function which is a general function for model training.
Fill in the blank in visualize_model function to brie y visualize how the trained model performs on validation images.
Fill in the blank in finetune function. Instead of random initialization, we initialize the network with a pre-trained network. Rest of the training looks as usual.
Fill in the blank in freeze function. We will freeze the weights for all of the network except that of the nal fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.
Run the script and report the accuracy on validation dataset for these two scenarios.
[15 points] Style Transfer
In this problem, you will run experiment for style transfer in style_transfer.py.
1
Implement content_loss function. Content loss measures how much the feature map of the generated image di ers from the feature map of the source image. We only care about the
content representation of one layer of the network (eg. layer l). Lc = wc
c;i;j(Fc;i;jl
Pc;i;jl)2
l
l
content image, w
as a
where F
is the feature of current image, P is the feature of source
c
P
scalar is the weight for content loss and the summation is over each element in the feature map.
2. Implement style_loss function. Ls = l wsl
i;j(Gi;jl Ai;jl)2 where Gl is the Gram matrix
from feature map of current image, A
l is the Gram matrix from feature map of the source
P
P
style image, and wsl as a scalar is the weight for content loss. We consider the style loss for feature maps from multiple layers.
Implement total_variation_loss function. It’s helpful to also encourage smoothness in the image by adding another term to our loss that penalizes wiggles or "total variation" in the pixel values. You can compute the "total variation" as the sum of the squares of di erences in the pixel values for all pairs of pixels that are next to each other (horizontally or vertically). Here we sum the total-variation regualarization for each of the 3 input channels (RGB), and weight
xx;i;j)2 +
i=1
j=1
(xc;i+1;j
xc;i;j)2)
3
P
H
W 1
P
i=1
P
the total summed loss by the total variation weight. Ltv = wtv
c=1(
j=1 (xc;i;j+1
H 1
W
P
P
Fill in the blank in style_transfer function to optimize the generated image and run the script. Show the generated images, and learning curve of loss for each generated image.
[15 points] Forward and Backward propagation module for
RNN
In this problem, you will implement your own RNN module in rnn_layers.py
Implement the rnn_step_forward function to do the forward pass for a single timestep t of RNN.
ht = tanh(Wxxt + Whht 1 + b)
where ht 2 Rm, Wx 2 Rm d, xt 2 Rd, Wh 2 Rm m, b 2 Rm. (In the code, we assume that data is stored in batches so that Xt 2 R n d and will work with transposed version of parameters Wx 2 Rd m, so the hidden features can be calculated as Ht = XtWx + Ht 1Wh + b.)
2. Derive
@L
,
@L
,
@L
,
@L
,
@L
in terms of
@L
according to the formula of forward pass
@ht
@b
@xt
@Wx
1
@Wh
@ht
above. Implement the rnn_step_backward function to do the backward pass for a single timestep t of RNN.
3. Implement the rnn_forward function to do the forward pass for RNN with a total of T steps. You could call rnn_step_forward function here. Given h0, this function will output h1; h2; ; hT .
4. Derive @L (81 t T ) , @L , @L , @L , @L in terms of @L (81 t T ). Implement the
@xt @Wx @h0 @Wh @b @ht
rnn_backward function to do the backward pass for RNN with a total of T steps. You could call rnn_step_backward function here.
2
[15 points] Forward and Backward propagation module for
LSTM
In this problem, you will implement your own LSTM module in rnn_layers.py
Implement the lstm_step_forward function to do the forward pass for a single timestep of LSTM. This should be similar to the rnn_step_forward function that you implemented above, but using the LSTM update rule instead. Note ht 2 Rm, ct 2 Rm, xt 2 Rd .
ft = sigmoid(Wxf xt + Whf ht 1 + bf ); Wxf 2 Rm d; Whf 2 Rm m; bf 2 Rm
it = sigmoid(Wxixt + Whiht 1 + bi); Wxi 2 Rm d; Whi 2 Rm m; bi 2 Rm
c~t = tanh(Wxcxt + Whcht 1 + bc); Wxc 2 Rm d; Whc 2 Rm m; bc 2 Rm
ot = sigmoid(Wxoxt + Whoht 1 + bo); Wxo 2 Rm d; Who 2 Rm m; bo 2 Rm
ct = ft ct 1 + it c~t
ht = ot tanh(ct)
(In the code, we assume that data is stored in batches so that Xt 2 Rn d and will work with transposed version of parameters Wx = [Wxf ; Wxi; Wxc; Wxo] 2 Rd 4m, so the activation can be calculated as A = XtWx + Ht 1Wh + b.)
2. Derive
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
,
@L
f
f
f
i
i
i
c
c
c
o
o
o
@xt
@Wx
@Wh
@b
@Wx
@Wh
@b
@Wx
@Wh
@b
@Wx
@Wh
@b
@ht
1
@ct 1
in terms of
@L
and
@L
according to the formulas of forward pass above.
Implement the
@ht
@ct
lstm_step_backward function to do the backward pass for a single timestep of LSTM.
Implement the lstm_forward function to do the forward pass for LSTM. You could call lstm_step_forward function here. Given h0, this function will output h1; h2; ; hT .
@L
; 81
t T ,
@L
@L
@L
@L
@L
@L
@L
@L
@L
@L
@L
@L
@L
4. Derive
,
,
,
,
,
,
,
,
,
,
,
,
@xt
@Wxf
@Whf
@bf
@Wxi
@Whi
@bi
@Wxc
@Whc
@bc
@Wxo
@Who
@bo
@h0
in terms of @L 81 t T , Implement the lstm_backward function to do the backward pass
@ht
for LSTM. You could call lstm_step_backward function here.
[20 points] Application to Image Captioning
In this problem, you will apply the RNN module you implemented to build an image captioning model.
At every timestep we use an fully-connected layer to transform the RNN hidden vector at that timestep into scores for each word in the vocabulary. This is very similar to the fully-connected layer that you implemented in homework1. Implement the forward pass in temporal_fc_forward function and the backward pass in temporal_fc_backward function in rnn_layers.py.
In an RNN language model, at every timestep we produce a score for each word in the vocab-ulary. We know the ground-truth word at each timestep, so we use a softmax loss function to compute loss and gradient at each timestep. We sum the losses over time and average them over the minibatch. Since we operate over minibatches and di erent captions may have
3
di erent lengths, we append NULL tokens to the end of each caption so they all have the same length. We don’t want these NULL tokens to count toward the loss or gradient, so in addition to scores and ground-truth labels our loss function also accepts a mask array that tells it which elements of the scores count towards the loss. This is very similar to the softmax loss layer that you implemented in homework1. Implement temporal_softmax_loss function in rnn_layers.py.
Now that you have implemented the necessary layers (you may also need the layers you im-plemented in hw1 in layers.py), you can combine them to build an image captioning model. Implement the forward and backward pass of the model in the loss function for both ’rnn’ case and ’lstm’ case, and the test-time forward pass in sample function in rnn.py.
Considering both vanilla RNN model and LSTM model, run the script image_captioning.py to get learning curves of training loss and the learned captions for samples.
Extra Points: Using the pieces you have implemented, you can try to train a captioning model that gives decent qualitative results when sampling on the validation set. You can subsample the training set for training if you want. In addition to qualitatively evaluating your model by inspecting its results, you can also quantitatively evaluate your model using the BLEU unigram precision metric, BLEU_score function in bleu_utils.py. evaluate_model function is the evaluation code that is compatible with the Numpy model as de ned above. Feel free to use PyTorch for this section if you’d like to train faster on a GPU. You should be able to adapt the evaluation code for PyTorch if you go that route.
[20 points] Application to text classi cation
In this problem, you will build and train models for sentiment classi cation. You will be provided labelled data for training your models. You will also be provided a set of unlabelled examples for which you will make predictions.
Use the provided labelled data data/{train.txt,dev.txt,test.txt} for building your models. Each line in these les is a sentence with the corresponding label at the beginning of the line. The data has been preprocessed, so you can simply do white-space tokenization and feed it to your models.
We will share a set of unlabelled examples with you a week before the deadline and you will turn in your predictions for those examples. You will also report the performance of your model on test.txt in your writeup for each of the following parts.
Note: You can use pytorch layers to implement your models in this question.
Train a sentiment classi er using a bag-of-words input representation. The bag-of-words repre-sentation of a token sequence is a binary vector with ones corresponding to the tokens present in the sequence and zeros everywhere else.
Network structure: Bag of words ! Linear ! sigmoid
In the previous part, you used a sparse binary representation of the input sentence. Replace the binary representation with a word embedding layer and use average pooling to obtain a xed length representation of the sentence.
Network structure: Word embeddings ! Average pooling ! Linear ! sigmoid
4
Repeat the previous part by initializing your word embeddings using pre-trained GloVe word embeddings.
So far, our input representations have ignored the order information in the sentence. We will use a sequence model to perform classi cation in this part. Train a RNN-based classi er that reads in a given input sentence and predicts the label at the nal time-step. Initialize your word embeddings with pre-trained GloVe word embeddings.
Network structure: Word embeddings ! RNN ! Linear ! sigmoid
Repeat the previous part replacing the vanilla RNN with an LSTM. Network structure: Word embeddings ! LSTM ! Linear ! sigmoid
5