Starting from:
$35

$29

BMI / CS 771 Homework Assignment 2 Solved

    • Overview

This assignment is about learning convolutional and Transformer neural net-works for image classi cation. You will implement, design and train di erent types of deep networks for scene recognition using PyTorch | an open source deep learning package. Moreover, you will take a closer look at the learned networks by (1) identifying important image regions for classi cation; (2) gen-erating adversarial samples to confuse your model; and (3) training models to defend against those adversarial samples (bonus). This assignment is team-based and requires cloud computing. A team can have up to 3 students. The assignment has a total of 12 points with 2 bonus points. Details and rubric are provided in Section 3.

    • Setup

We recommend using Conda to manage your packages.

The following packages are needed: PyTorch ( 1.10 with GPU support), OpenCV ( 3), NumPy, gdown, and Tensorboard. Again, you are in charge of installing them.

You can debug your code and run experiments on CPUs. However, train-ing a neural network is very expensive on CPUs. GPU computing is thus required for this project. Please setup your team’s cloud instance. Do remember to shutdown the instance when it is not used!

You will download the MiniPlaces dataset for Part II & III of the project. We have included a script for downloading and unpacking the dataset (assuming all dependencies are installed). Simply run

sh ./download dataset.sh

You will need to  ll in the missing code in:

./code/student code.py



1





Your submission should include the code, results and a writeup. The submission can be generated using:

python ./zip submission.py


    • Details

This assignment has three parts. An autograder will be used to grade some parts of the assignment. Please follow the instructions closely.

3.1    Understand Convolutions (2 Pts)

In the rst part, you will implement the 2D convolution operation | a fun-damental component of deep convolutional neural networks. Speci cally, a 2D convolution is de ned as

Y = W S X + b
(1)
Input: X is a 2D feature map of size Ci Hi Wi (following PyTorch’s convention). Hi and Wi are the height and width of the 2D map and Ci is the input feature channels.

Weight: W de nes the convolution lters and is of size Co Ci K K, where K is the kernel size. We only consider squared lters. W will be learned from data.

Stride: S denotes the convolution operation with stride S, where S is the step size of the sliding window when W convolves with X. We will only consider equal stride size along the height and width.

Bias: b is the bias term of size Co, and is added to every spatial location H W after the convolution. Again, b will be learned from data.

Padding: Padding is often used before the convolution. We only consider equal padding along all sides of the feature map. A (zero) padding of size P adds zeros-valued features to each side of the 2D map.

Output: Y is the output feature map of size Co   Ho   Wo, where Ho =

Hi
+2P  K
+ 1 and Wo =
W +2P
K
+ 1.





i




S


S


Helper Code: We have provided some helper functions for the implementa-tion (./code/student code.py). You will need to ll in the missing code in the class CustomConv2DFunction. You can use the fold / unfold functions and any matrix / tensor operations provided by PyTorch, except the convolution functions. Please do not modify the code in the class CustomConv2d. This is the module wrapper for your code.


Requirements: You will implement both the forward pass and the backward propagation for this 2D convolution operation. The implementation should work


2





with any kernel size K, input and output feature channels Ci=Co, stride S and padding P . Importantly, your implementation must compute Y given input X and parameters W and b, and the gradients of @@XY , @@WY and @@bY . All derivations of the gradients can be found in our course material, except @@bY (implementation provided in helper code). In your writeup, please describe your implementation.


Testing Code: How can you make sure that your implementation is cor-rect? You can compare your forward pass / backward propagation results with PyTorch’s own Conv2d implementation. You can also compare your gradi-ents with the numerical gradients. We have included sample testing code in ./code/test conv.py. Please make sure your code can pass this test.


3.2    Design and Train a Deep Neural Network

In the second part, you will design and train convolutional and Transformer neural networks for scene classi cation using MiniPlaces dataset.

MiniPlaces Dataset: MiniPlaces is a scene recognition dataset developed by MIT. This dataset has 120K images from 100 scene categories. The categories are mutually exclusive. The dataset is split into 100K images for training, 10K images for validation and 10K for testing. You will need to manually down-load this dataset. The images and annotations will be located under ./data. We will evaluate top-1/5 accuracy on the validation set as our performance metric. For more details about the dataset, please refer to their github page https://github.com/CSAILVision/miniplaces.

Downloading the Dataset: A script has been included to download and unpack the dataset. Please follow the instruction in Section 2 (Setup). If suc-cessful, the data folder will contain two sub-folders \images" and \objects," as well as two text les \train.txt" and \val.txt," in addition to data.tar.gz.

Helper Code: We have provided helper code for training and testing a deep model (./code/main.py). You will run this script many times but it is unlikely that you will need to modify this le. If you do modify this le, please describe the modi cation and its justi cation in your writeup. To check how to use this function, run

python ./main.py --help

Other helper code include

Dataloader (./code/custom dataloader.py) for MiniPlaces Dataset.

Image augmentations (./code/custom transforms.py), which also provide a reference solution for HW1.

Transformer blocks (./code/custom blocks.py) for the implementation of vision Transformer models.



3





Finally, a simple convolutional neural network is implemented by SimpleNet in ./code/student code.py. You will likely need to modify this class for design-ing your own model.


Monitor the Training: All intermediate results during training, including training loss, learning rate, train/validation accuracy are logged into les under

./logs. You can monitor and visualize these variables by using tensorboard --logdir=../logs

We recommend copying the ../logs folder to a local machine and use Tensor-board locally for the curves. Thus, you can avoid to setup a Tensorboard server on the cloud. Make sure you backup and clean the log folder after each run of the experiment. These curves should be included in your writeup.

Requirements: You will design and train a deep network for scene recogni-tion. You model must be trained using only the training set. Using labels of the validation set for training, or using ImageNet pre-trained weights is not allowed, unless otherwise speci ed.

A Simple Convolutional Network (1 Pt): Let us start by training our rst deep network from scratch. No coding is needed in this section | we provide the dataloader and a simple network to start with. You can run

python ./main.py ../data --epochs=60

Importantly, GPU is needed for the training. The training might take a few hours and will give you a model with 40%+ top-1 accuracy on the validation set. Do remember to put your training inside a container, e.g., tmux or screen, such that your process won’t get killed when you SSH session expires. You can also use

watch -n 0.1 nvidia-smi

to monitor GPU utilization and memory consumption. Once the training is done, the best model will be saved as ./models/model best.pth.tar. The saved models will be overwritten for each new experiment. Make sure you backup the models when necessary. You can evaluate this model by

python ./main.py ../data --resume=../models/model best.pth.tar -e


Train with Your Own Convolutions (1 Pt): As a step forward, we will use our own convolution to replace PyTorch’s version and train the model for 10 epochs. This can be done by

python ./main.py ../data --epochs=10 --use-custom-conv

How is your implementation di erent from PyTorch’s version in terms of mem-ory consumption, training speed, and loss curve? What are the factors that might have produced the di erence? Please describe your ndings in the writeup.

Implement a Vision Transformer (2 Pt): Beyond a convolutional network, let us now implement a Transformer based model [1]. Transformer blocks that


4





supports local window self-attention, are already implemented in our helper code (./code/custom blocks.py). You will need to review the implementation and gure out how to use these blocks to build a vision Transformer using SimpleViT in ./code/student code.py. To train the Transformer model, run python ./main.py ../data --epochs=90 --wd 0.05 --lr 0.01 {use-vit


Here we decrease the learning rate and increase the weight decay and number of training epochs, partially due to the use of AdamW optimizer [5]. With our default parameters, the model should give a performance level similar to the simple convolutional network. You are welcome to play with the default param-eters of SimpleViT as well as the hyperparameters for training, yet keep in mind that the model could take a much longer time to train. In your writeup, please describe the design of your vision Transformer model, its training scheme, and the results.

Design Your Own Network (1 Pts): Now let us try to improve the simple networks. You can choose to focus on either the convolutional network or the Transformer network. Your goal is to design a \better" network for this recogni-tion task. There are a couple of things you can explore here. For example, you can add more convolutional layers [8], yet the model might start to diverge in the training. This divergence can be avoided by adding residual connections [3] and/or batch normalization [4]. You might also want to try di erent type of self-attention for the Transformer. You can also tweak the hyper-parameters for training, e.g., initial learning rate, weight decay, training epochs, type of data augmentations. Most of the hyper-parameters can be passed as an argument to main.py. In all cases, you should implement your network in student code.py and call main.py for training. A good architecture strikes a balance between e ciency and accuracy.


Please describe and justify your design of the model and the training scheme, and present your results in the writeup (including training curves and train-ing/validation accuracy).

Fine-Tune a Pre-trained Model (1 Pts): As the nal step, we will ne-tune a residual network (18 layers) pre-trained on ImageNet [3]. The implementation is included in the helper code. And you can run

python ./main.py ../data --epochs=60 --use-resnet18

How is your model compared to this pre-trained ResNet18? For a in-depth comparison, you can look at the training curves and the training and validation accuracy. Please include the comparison in your writeup.


3.3    Attention and Adversarial Samples

In the nal part, we will look at attention maps and adversarial samples. They present two critical aspects of deep neural networks: interpretation and robust-ness, and thus will help us gain insight about these networks. For this Section,

5





we will only consider convolutional networks.

Helper Code: Helper code is provided in ./code/main.py and student code.py for visualizing attention maps and generating adversarial samples. For attention maps, you will ll in the missing code in class GradAttention. For adversarial samples, you need to complete the class PGDAttack. For adversarial training, your will have to modify part of the SimpleNet.


Requirements: You will implement methods for generating attention maps and adversarial samples, and optionally for adversarial training as a defense to adversarial samples.

Saliency Maps (2 Pts): Suppose you have a trained model. If you minimize the loss of the predicted label and compute the gradient of the loss w.r.t. the input, the magnitude of a pixel’s gradient indicates the importance of the pixel for the decision. You can create a 2D attention map by (1) computing the input gradient by minimizing the loss of the predicted label (most con dent prediction); (2) taking the absolute values of the gradients; and (3) pick the maximum values across three color channels. This method was discussed in [7]. Once you nished the coding, run

python ./main.py ../data --resume=../models/model best.pth.tar -e -v


This command will evaluate your trained model (assuming model best.pth.tar) and visualize the attention maps. All attention maps will be saved under ./logs. Again you can use Tensorboard to visualize the results

tensorboard --logdir=../logs

Now you will see a new tab named \Image", where you can scroll the slider on top to see samples from di erent batches. You can also zoom in the image by clicking on it. Please include and discuss the visualization in your writeup.

Adversarial Samples (2 Pts): Interestingly, by minimizing the loss of an incorrect label and compute the gradient of the loss w.r.t. the input, one can create adversarial samples that will confuse a model! This was rst presented in [9] and further analyzed in [2]. We will consider the least con dent label as a proxy for the incorrect label. And you will implement the Projected Gra-dient Descent under l1 norm in [6]. Speci cally, PGD takes several steps of fast gradient sign method. At each time step, PGD also clips the result to the -neighborhood of the input. This implementation, however, requires some thoughts. The gradient operations should not be recorded by PyTorch, as doing so will create a computational graph that grows inde nitely over time. Again, you can call main.py once you complete the implementation

python ./main.py ../data --resume=../models/model best.pth.tar -a -v


This command will generate adversarial samples on the validation set and try to attack your model. And you can see how the accuracy drops (signi cantly!). Moreover, adversarial samples will be saved in the \logs" folder. And you can


6





use Tensorboard to check them. This time, you will nd tabs \Org Image" and \Adv Image". Can you see the di erence between the original images and the adversarial samples? What if you increase the number of iterations and reduce the error bound ( )? Please discuss your implementation of PGD and present the results (accuracy drop and adversarial samples) in your writeup.


Adversarial Training (Bonus +2 Pts): A deep model should be robust against adversarial samples. As we discussed in the lecture, a possible solution is using adversarial training, as described in [2, 6]. The key idea is to generate adversarial samples and feed these samples into the network during training. To implement adversarial training, you can attach your PGD to the forward function in the SimpleNet (See the comments in the code for details). Unfor-tunately, this training can be 10x times more expansive than a normal training. To accelerate this process, we recommend to (1) reduce the number of steps in PGD and (2) reduce the number of epochs in training. The goal is to show that when compared to a model using normal training, your model using adversarial training has a better chance to defend adversarial attacks. Please discuss your experimental design, and present your results in the writeup.

    • Writeup

For this assignment, and all other assignments, you must submit a project report in PDF. Every team member should send the same copy of the report. For teams with more than one member, please clearly identify the contributions of all members. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. For this project, we have included detailed instructions for the writeup in each part of the project. You can also discuss anything extra you did. Feel free to add other information you feel is relevant.

    • Handing in

This is very important as you will lose points if you do not follow instructions.

Every time after the    rst that you do not follow instructions, you will lose 5%.

The folder you hand in must contain the following:

code/ - directory containing all your code for this assignment writeup/ - directory containing your report for this assignment. results/ - directory containing your results.

Do not use absolute paths in your code (e.g. /user/classes/proj1). Your code will break if you use absolute paths and you will lose points because of it. Simply use relative paths as the starter code already does. Do not turn in the data / logs / models folder. Hand in your project as a zip le through Canvas. You can create this zip le using python zip submission.py.



7




References

    [1] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un-terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In In-ternational Conference on Learning Representations, 2021.

    [2] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.

    [3] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.

    [4] S. Io e and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

    [5] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.

    [6] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.

    [7] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classi cation models and saliency maps. In ICLR, 2014.

    [8] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.

    [9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.



























8

More products