Deep Learning: Assignment 3 Solution

Starting from:

~~$30~~

$24

Home

Implementation of CNN, RNN, LSTM, and GRU

This assignment involves the following tasks:

Image classi cation using CNN

Sentiment Analysis using RNN, LSTM, and GRU

Submit the executed code in Jupyter notebook. You can write your observations and results using the heading and markdown cells in Jupyter. If you have memory or GPU constraints, you can use Google colab or Kaggle. The links to the libraries and resources required are added in Moodle. If the training takes lot of time, then train in intervals by saving and restoring the models from checkpoints. You can learn about saving and restoring the models from the documentation of Tensor ow or Keras.

Image Classi cation using CNN:

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The task is to construct a CNN for CIFAR10 classi cation.

Fetch the training and test datasets for CIFAR10 using the built in functions in Tensor ow or Keras. Follow the links in Moodle, to know how to download the datasets.

Create a validation set of 10000 images from the training set.

Train the following CNN architectures as follows:

Model1:

The network comprises of 2 convolutional layers and 2 max pooling layers and 1 fully connected hidden layer.

Number of kernels in the rst convlayer =32 (5x5 lters) Number of kernels in the 2nd conv layer =64 (5x5 lters) Max pooling of size 2x2

Non-Linearity used in all hidden layers is ReLU The output layer is softmax

Number of units in the fully connected hidden layer1 = 64 Number of units in the output layer =10

The network architecture would be as given below:

Input ! conv1 (32 lters (5x5)) ! Maxpool (2x2) ! conv2 (64 lters (5x5)) ! Maxpool (2x2) ! Flatten ! FCL1(64) ! Softmax output layer(10)

Model2:

The network comprises of 2 convolutional layers and 2 max pooling layers and 1 fully connected hidden layer.

1

Number of kernels in the rst convlayer =32 (5x5 lters) Number of kernels in the 2nd conv layer =64 (5x5 lters) Max pooling of size 3x3

Non-Linearity used in all hidden layers is ReLU The output layer is softmax

Number of units in the fully connected hidden layer1 = 64 Number of units in the output layer =10

The network architecture would be as given below:

Input ! conv1 (32 lters (5x5)) ! Maxpool (3x3) ! conv2 (64 lters (5x5)) ! Maxpool (3x3) ! Flatten ! FCL1(64) ! Softmax output layer(10)

Model3:

The network comprises of 2 convolutional layers and 2 max pooling layers and 1 fully connected hidden layer.

Number of lter kernels in the rst convlayer =32 (3x3 lters) Number of lter kernel in the 2nd conv layer =64 (3x3 lters) Max pooling of size 2x2

Non-Linearity used in all hidden layers is ReLU The output layer is softmax

Number of units in the fully connected hidden layer1 = 64 Number of units in the output layer =10

The network architecture would be as given below:

Input ! conv1 (32 lters (3x3)) ! Maxpool (2x2) ! conv2 (64 lters (3x3)) ! Maxpool (2x2) ! Flatten ! FCL1(64) ! Softmax output layer(10)

Model4:

The network comprises of 3 convolutional layers and 2 max pooling layers and 1 fully connected hidden layer.

Number of kernels in the rst convlayer =32 (3x3 lters) Number of kernels in the 2nd conv layer =64 (3x3 lters) Number of kernels in the 3rd conv layer =128 (3x3 lters) Max pooling of size 2x2

Non-Linearity used in all hidden layers is ReLU The output layer is softmax

Number of units in the fully connected hidden layer1 = 64 Number of units in the output layer =10

The network architecture would be as given below:

Input ! conv1 (32 lters (3x3)) ! Maxpool (2x2) ! conv2 (64 lters (3x3)) ! Maxpool (2x2) ! conv3 (128 lters (3x3)) ! Maxpool (2x2) ! Flatten ! FCL1(64) ! Softmax output layer(10)

Which the best model among model1 to model4? Why?

Model5:

Choose the architecture of best model among model1, model2, model3, and model4. Add strides of 2 to all the convolution layers. If it is not possible to add strides due to negative dimension, choose the next best model.

2

Model6:

Choose the architecture of best model of model1, model2,model3, and model4. Add \SAME" padding to all the convolution layers.

Model7:

Choose the architecture of best model of model1, model2,model3, and model4. Add \SAME" padding to all the convolution layers with stride2. If it is not possible to add strides due to negative dimension, choose the next best model.

Which the best model among model1 to model7? Why?

Model8:

Choose the architecture of the best model among model1 to model7. Use tanh activation function in the hidden layers instead of ReLU

Model 9:

Choose the architecture of the best model among model1 to model7. Use sigmoid activation function in the hidden layers instead of ReLU

For each model trained, plot the loss vs iteration curve and accuracy vs iteration curve for training data.

Tabulate the following for model 1 to model 9. You can create the table by hand and then upload the screenshot of the table in the jupyter notebook.

Total number of trainable parameters. Training time

Training accuracy Validation accuracy Test accuracy

Sentiment analysis on IMDB movie review dataset:

The dataset comprises of 50,000 movie reviews taken from IMDb. The training and test data sets are uploaded in moodle. The datasets are in the form of text les. Every line in the les is a movie review. The training set and test set comprises of 25000 movie reviews each. For both training and test data sets, the rst 12500 reviews are positive and the rest of the reviews are negative.

Preprocess the dataset:

Remove punctuation and < br tag. Convert all the characters to lower case.

Create the target vector for training and test data. The rst 12500 values of target vector is 1 and the rest of the values are zero. The target vector is same for training and test data.

Create a validation set which is 20% of the training set.

Train the following models.

Model 1:

Integer encode the reviews

To make the length of all the reviews equal, pad the reviews to maximum length of 200. You can use pad sequences API in Tensor ow or Keras.

It is possible to learn the vector embedding of words using the embedding layer in Tensor ow or Keras. Follow the links in Moodle to learn about vector embeddings.

The network consists of an embedding layer, 1 RNN layer and output layer.

3

The embedding size of embedding layer = 128 Number of nodes in RNN layers = 200

Non-linearity used in all RNN layers is tanh. The output layer is sigmoid.

The network architecture is as follows:

Input ! Embeddding Layer (embedding size=128) ! RNN1(200 nodes) ! Sigmoid output

Model 2:

Integer encode the reviews

The network consists of an embedding layer, 1 LSTM layer and output layer. The embedding size of embedding layer = 128

Number of nodes in LSTM layers = 200

Non-linearity used in all LSTM layers is tanh. The output layer is sigmoid.

The network architecture is as follows:

Input ! Embeddding Layer (embedding size=128) ! LSTM1(200 nodes) ! Sigmoid out-put

Model 3:

Integer encode the reviews

The network consists of an embedding layer, 1 GRU layer and output layer. The embedding size of embedding layer = 128

Number of nodes in GRU layers = 200

Non-linearity used in all GRU layers is ReLU. The output layer is sigmoid.

The network architecture is as follows:

Input ! Embeddding Layer (embedding size=128) ! GRU1(200 nodes) ! Sigmoid output

Which is the best model among model1 to model3 - RNN, LSTM or GRU? Why?

Model 4:

Integer encode the reviews

The network consists of an embedding layer, 2 (GRU/LSTM/RNN based of the best per-forming model in model1 to model3) layers and output layer.

All other con gurations are same as model1.

Model 5:

Integer encode the reviews

The network consists of an embedding layer, 3 (GRU/LSTM/RNN based of the best per-forming model in model1 to model3) layers and output layer.

All other con gurations are same as model1.

Model 6:

Use word2vec to create word embeddings of length 128

Create vector representation of reviews from word embedding

Choose the best performing model among model1 to model5. Remove the embedding layer in the best model and give the vector embeddings obtained by word2vec as input.

4

The network architecture is as follows:

Input(Vector embeddings obtained using word2vec) ! RNN/LSTM/GRU layers (the num-ber of layers and nodes same as the best model among model 1 to model5) ! Sigmoid output layer

Plot the loss vs iteration and accuracy vs iteration curve for training data for all the models.

Tabulate the training, testing and validation accuracy for all the models.

5