Assignment 4 Solution

Starting from:

~~$35~~

$29

Home

Instructions

Please use Google Classroom to upload your submission by the deadline mentioned above. Your submission should comprise of a single ZIP le, named <Your Roll No> Assign4, with all your solutions, including code.

For late submissions, 10% is deducted for each day (including weekend) late after an assignment is due. Note that each student begins the course with 7 grace days for late submission of assignments. Late submissions will automatically use your grace days balance, if you have any left. You can see your balance on the CS5370 Marks and Grace Days document.

You have to use Python for the programming questions.

Please read the department plagiarism policy. Do not engage in any form of cheating - strict penalties will be imposed for both givers and takers. Please talk to instructor or TA if you have concerns.

• Convolutional Neural Networks (27 marks)

In this problem, we will train a convolutional neural network for a task known as image colourization. That is, given a greyscale image, we wish to predict the colour at each pixel. This is a di cult problem for many reasons, one of which being that it is ill-posed: for a single greyscale image, there can be multiple, equally valid colourings.

We recommend you to use Colab (https://colab.research.google.com/) for this assignment. From the as-signment zip le, you will nd two python notebook les: colour regression.ipynb, colourization.ipynb. To setup the Colab environment, you will need to upload the two notebook les using the upload tab at https://colab.research.google.com/.

We will use the CIFAR-10 data set, which consists of images of size 32 32 pixels. For most of the questions, we will use a subset of the dataset. The data loading script is included with the notebooks, and should download automatically the rst time it is loaded. If you have trouble downloading the le, you can also do so manually from the provided cifar-10-python.tar.gz. To make the problem easier, we will only use the \Horse" category from this data set.

1. Colourization as Regression (5 marks): Image colourization can be posed as a regression problem, where we build a model to predict the RGB intensities at each pixel given the greyscale input. In this case, the outputs are continuous, and so mean-squared error can be used to train the

1

model. A set of weights for such a model is included with the assignment. In this question, you will get familar with training neural networks using cloud GPUs. Read the code in colour regression.py, and answer the following questions.

(a) Describe the model RegressionCNN. How many convolution layers does it have? What are the lter sizes and number of lters at each layer? Construct a table or draw a diagram.

(b) Run all the notebook cells in colour regression.ipynb on Colab (No coding involved). You will train a CNN, and generate some images showing validation outputs. How many epochs are we training the CNN model in the given setting?

(c) Re-train a couple of new models using a di erent number of training epochs. You may train each new models in a new code cell by copying and modifying the code from the last notebook cell. Comment on how the results (output images, training loss) change as we increase or decrease the number of epochs.

(d) A colour space1 is a choice of mapping of colours into three-dimensional coordinates. Some colours could be close together in one colour space, but further apart in others. The RGBcolour space is probably the most familiar to you, but most state of the art colourization models do not use RGB colour space. The model used in colour regression.ipynb computes squared error in RGB colour space. How could using the RGB colour space be problematic?

(e) Most state-of-the-art colourization models frame colourization as a classi cation problem instead of a regression problem. Why? (Hint: what does minimizing squared error encourage?)

2. Colourization as Classi cation (2+2=4 marks): We will select a subset of 24 colours and frame colourization as a pixel-wise classi cation problem, where we label each pixel with one of 24 colours. The 24 colours are selected using k-means clustering over colours, and selecting cluster centers. This has already been done for you, and cluster centers are provided in colour/colour kmeans*.npy les. For simplicity, we still measure distance in RGB space. This is not ideal but reduces the dependencies for this assignment. Open the notebook colourization.ipynb and answer the following questions.

(a) Complete the model CNN on colourization.ipynb. This model should have the same layers and convolutional lters as the RegressionCNN, with the exception of the output layer. Continue to use PyTorch layers like nn.ReLU, nn.BatchNorm2d and nn.MaxPool2d, however we will not use nn.Conv2d. We will use our own convolution layer MyConv2d included in the le to better understand its internals.

(b) Run main training loop of CNN in colourization.ipynb on Colab. This will train a CNN for a few epochs using the cross-entropy objective. It will generate some images showing the trained result at the end. How do the results compare to the previous regression model?

3. Skip Connections (2+3+1=6 marks): A skip connection in a neural network is a connection which skips one or more layer and connects to a later layer. We will introduce skip connections.

(a) Add a skip connection from the rst layer to the last, second layer to the second last, etc. That is, the nal convolution should have both the output of the previous layer and the initial

greyscale input as input. This type of skip-connection results in a "UNet" model2. Following the CNN class that you have completed, complete the init and forward methods of the UNet class. (Hint: You will need to use the function torch.cat.)

(b) Train the "UNet" model for the same amount of epochs as the previous CNN and plot the training curve using a batch size of 100. How does the result compare to the previous model?

• https://en.wikipedia.org/wiki/colour space

2Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation, MICCAI 2015

2

Did skip connections improve the validation loss and accuracy? Did the skip connections improve the output qualitatively? How? Give at least two reasons why skip connections might improve the performance of our CNN models.

(c) Re-train a few more "UNet" models using di erent mini batch sizes with a xed number of epochs. Describe the e ect of batch sizes on the training/validation loss, and the nal image output.

4. Super-resolution (1+3=4 marks): Many classic image processing problems are to transform the input images into an output image via a transformation pipeline, e.g. colourization, denoising, and super-resolution. These image processing tasks share many similarities, where the inputs are lower quality images and the outputs are the restored high-quality images. Instead of hand-designing the transformations, one approach is to learn the transformation pipeline from a training dataset using supervised learning. Previously, you have trained conv nets for colourization. In this question, you will use the same conv net models to solve super-resolution tasks. In the super-resolution task, we aim to recover a high-resolution image from a low-resolution input.

(a) Take a look at the data process function process. What is the resolution di erence between the downsized input image and output image?

(b) Bilinear interpolation3 is one of the basic but widely used resampling techniques in image pro-cessing. Run super-resolution with both CNN and UNet. Are there any di erence in the model outputs? Also, comment on how the neural network results (images from the third row) di er from the bilinear interpolation results (images from the fourth row). Give at least two reasons why conv nets are better than bilinear interpolation.

5. Visualizing Intermediate Activations (3 marks): We will visualize the intermediate activa-tions for several inputs. Run the visualization block in the colourization.ipynb that has al-ready been written for you. For each model, a list of images will be generated and be stored in cs6360/a2/outputs/model name/act0/ folder in the Colab environment. You will need to use the left side panel (the "Table of contents" panel) to nd these images under the Files tab.

(a) Visualize the activations of the CNN for a few test examples. How are the activation in the rst few layers di erent from the later layers? You do not need to attach the output images to your writeup, only descriptions of what you see.

(b) Visualize the activations of the colourization UNet for a few test examples. How do the activa-tions di er from the CNN activations?

(c) Visualize the activations of the super-resolution UNet for a few test examples. Describe how the activations di er from the colourization models.

6. Some Conceptual Questions (2+1+1+1=5 marks):

(a) We did not tune any hyperparameters for this assignment other than the number of epochs and batch size. What are some hyperparameters that could be tuned? List ve. Try any one and report what you observe for the colourization problem.

(b) In the RegressionCNN model, nn.MaxPool2d layers are applied after nn.ReLU activations. Com-ment on how the output of CNN changes if we switch the order of the max-pooling and ReLU.

(c) The loss functions and the evaluation metrics in this assignment are de ned at pixel-level. In general, these pixel-level measures correlate poorly with human assessment of visual quality. How can we improve the evaluation to match with human assessment better? (Hint: You may nd this paper useful for answering this question.)

• https://en.wikipedia.org/wiki/Bilinear interpolation

3

(d) In colourization.ipynb, we trained a few di erent image processing convolutional neural networks on input and output image size of 32 32. In the test time, the desired output size is often di erent than the one used in training. Describe how we can modify the trained models in this assignment to colourize test images that are larger than 32 32.

• Recurrent Neural Networks (13 marks)

In this problem, you will work on extending min-char-rnn.py, the vanilla RNN language model written by Andrej Karpathy4. You will experiment with the Shakespeare dataset, provided with this assignment.

1. (2+2=4 marks) The RNN language model uses a softmax activation function for its output distribu-tion at each time step. It’s possible to modify the distribution by multiplying the logits by a constant

:

• = softmax( z)

Here, 1= can be thought of as a temperature, i.e. lower values of correspond to a hotter distri-bution. (This terminology comes from an algorithm called simulated annealing.) Write a function to sample text from the model using di erent temperatures (i.e., 1= values). Try di erent tem-peratures, and, in your report, include examples of texts generated using di erent temperatures. Brie y discuss what di erence the temperature makes. Include the source code of the function you wrote/modi ed to accomplish the task in the report. You should either train the RNN yourself, or use the weights from Part 3 (later here) - up to you.

2. (2+2=4 marks) Write a function that uses an RNN to complete a string. That is, the RNN should generate text that is a plausible continuation of a given starter string. In order to do that, you will need to compute the hidden activity h at the end of the starter string, and then to start generating new text. Include 5 interesting examples of outputs that your network generated using a starter string. (This part need not be easily reproducible). Include the source code of the function you wrote in the report. You should either train the RNN yourself, or use the weights from Part 3 - up to you.

3. (3 marks) The weights for a trained RNN are included as char-rnn-snapshot.npz. Some samples from the RNN (at temperature 1= = 1) are included as samples.txt, and code to read in the weights is included as read in npz.py (if this doesn’t work, try the pickle le, and get it using import cPickle as pickle; a = pickle.load(open("char-rnn-snapshot.pkl")).)

In the samples that the RNN generated, it seems that a newline or a space usually follows the colon (i.e., \:") character. In the weight data provided, identify the speci c weights that are responsible for this behavior by the RNN. In your report, specify the coordinates and values of the weights you identi ed, and explain how those weights make the RNN generate newlines and spaces after colons.

4. (2 marks) Identify another interesting behaviour of the RNN, identify the weights that are responsible for it. Specify the coordinates and the values of the weights, and explain how those weights lead to the behavior that you identi ed.

• Generative Adversarial Networks (GANs) (10 marks)

This question is adapted from the online course, CS231N, with due credit to the creators there. In this problem, you will work on learning how to implement GANs and using them for supervised classi cation.

• https://gist.github.com/karpathy/d4dee566867f8291f086

4

You are provided with an iPython notebook le along with MNIST dataset. Follow the instructions provided in the iPython notebook for the rest of the assignment, ll the code as required, and answer the inline questions at the end. (In case you have any issues with the datasets provided, let us know on Classroom.) (7 marks for the code, and 3 marks for the inline questions)

5