$24
[25 points] Text Classi cation using CNNs
In this problem, you will implement a CNN text classi er similar to the network of [1] for sentiment classi cation. The network has the following architecture.
Embedding layer ! 1-d Convolution ! Pooling ! ReLU ! Linear ! Sigmoid
Assume that we perform global average-pooling in the pooling layer.
Assume the input to the convolution layer is given by X 2 RN C H . Further assume that the tem-poral dimension (or sequence dimension) is the third dimension, of size H. Consider a convolutional
kernel W conv and a bias vector b 2 RF . The output of 1-d convolution is given by
2 RN F H00.
Express the output of the convolutional layer Yn;f as a function of Xn, Wfconv and bf based on the lt notation de ned in homework 1.
What is the size of Yn;f in terms of H; H0 ?
What size is the output of the pooling layer ?
Implement the network as a nn.Module class called CNN in the emtpy le cnn.py.
Your module will need a _init_() and a forward() function.
The input to your forward() function will be a 2-d tensor of word ids of size N H. It will returns logits of size N.
Your will nd nn.Conv1d useful in your implementation. Use a xed kernel size of H0 = 5
Choose an appropriate number of feature maps F (Eg. 128)
Train your model on the sentiment classi cation task from homework 2. You should pad your sequences so that sequences in a batch have the same length. Report test accuracies for the following architectural choices and hyperparameters. You can either use pre-trained word vectors or train from scratch for this part.
1
Global average-pooling, Global max-pooling Kernel sizes: 5, 7
[10 points] Siamese Networks for Learning Embeddings.
In this problem, you will implement a Siamese network for face veri cation in siamese_face.py. The dataset is att_faces.tar and lfw_faces.tar.You can download the data from https:// drive.google.com/drive/folders/1Vpf8XctTtc-Swug7JE5YJU2URkvvxByV?usp=sharing
1. Implement the contrasive loss class ContrasiveLoss
L(x(1); x(2); y) = (1 y)kf(x(1)) f(x(2))k22 + y(maxf0; m k f(x(1)) f(x(2))k2g)2
where m is the margin value, y is the label denoting whether x(1) and x(2) are from the same person. For more details, you may want to read http://yann.lecun.com/exdb/publis/pdf/ hadsell-chopra-lecun-06.pdf
Design architecture and build the embedding network in class SiameseNetwork, and train the siamese network for face images in att dataset. Report the learning curve of losses on training dataset and the qualitative result on training/testing dataset, i.e. the gures showing a pair of images and the dissimilarity measurement between these two pictures. You may follow the skeleton of architecture given in code comment to get reasonable performance. It’s also great if you could adjust the optimizer, learning rate, number of epochs, network architecture, image pre-processing, batch size, margin value, etc. to get even better performance. We expect the reported dissimilarity number is consistent with visual similarity.
Extra credit: Repeat the process in the above question to train siamese network for face images in lfw dataset. The given skeleton of architecture and hyperparameters might not work for this large dataset, so you may need to adjust the optimizer, learning rate, number of epochs, network architecture, image pre-processing, batch size, margin value, etc. to get even better performance.
[35 points] Conditional Variational Autoencoders.
In this problem, you will implement a conditional variational autoencoder (CVAE) from [2] and train it on the MNIST dataset.
1. Derive the variational lowerbound of a conditional variational autoencoder. Show that:
log p (xjy) L ( ; ; x; y)
= Eq (zjx;y) [log p (xjz; y)] DKL (q (zjx; y) kp (zjy)) ;
(1)
where x is a binary vector of dimension d, y is a one-hot vector of dimension c de ning a class, z is a vector of dimension m sampled from the posterior distribution q (zjx; y). The posterior distribution is modeled by a neural network of parameters . The generative distribution p (xjy) is modeled by another neural network of parameters .
2
Derive the analytical solution to the KL-divergence between two Gaussian distributions DKL (q (zjx; y) kp (zjy)). Let us assume that p (zjy) N (0; I) and show that:
1
J
DKL (q (zjx; y) kp (zjy)) =
Xj
1 + log j2
j2
j2
;
(2)
2
=1
where j and j are the outputs of the neural network that estimates the parameters of the posterior distribution q (zjx; y).
Fill in code for CVAE network as a nn.Module class called CVAE in the le cvae.py
Implement the recognition_model function q (zjx; y). Implement the generative_model function p (xjz; y).
Implement the forward function by inferring the Gaussian parameters using the recogni-tion model, sampling a latent variable using the reparametrization trick and generating the data using the generative model.
Implement the variational lowerbound loss_function L ( ; ; x; y). Train the CVAE and visualize.
If trained successfully, you should be able to sample images x that re ect the given label y given the noise vector z.
[30 points] Generative Adversarial Networks.
In this problem, you will implement generative adversarial networks and train it on the MNIST dataset. Speci cally, you will implement the Deep Convolutional Generative Adversarial Networks (DCGAN) from [3]. In the generative adversarial networks formulation, we have a generator network G that takes in random vector z and a discriminator network D that takes in an input image x. The parameters of G and D are optimized via the adversarial objective:
min max
log D(x)
i
+
Ez p(z)h
log(1 D(G(z))) ;
(3)
G D
Ex pdata h
i
In practice, we alternate between training D and G where we train G to maximize:
Ez p(z)h log D(G(z))i;
(4)
and we follow by training D to maximize:
D(G(z)))i;
Ex pdata h log D(x)i + Ez p(z)h log(1
(5)
Therefore, the two separate optimizations make up one full training step. Given this information, you will do the following:
Fill in code for the DCGAN network in the gan.py. Descriptions of what should be lled in is writen as comments in the code itself.
Implement the sample_noise function.
3
Implement the build_discriminator function. Implement the build_generator function.
Implement the get_optimizer function. Implement the bce_loss function.
Use the previously implemented bce_loss to implement the discriminator_loss func-tion.
Use the previously implemented bce_loss implement the generator_loss function. Train your DCGAN!
If trained successfully, you should see the progression of sample quality getting better as training epochs increase.
References
Yoon Kim. Convolutional neural networks for sentence classi cation. arXiv preprint arXiv:1408.5882, 2014.
Kihyuk Sohn, Xinchen Yan, and Honglak Lee. Learning structured output representation using deep conditional generative models. In NeurIPS. 2015.
Alec Radford, Luke Me, and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In ICLR. 2016.
4