Deep Learning Practical Session 6

Starting from:

~~$30~~

$24

Home

Introduction

The objective of this session is to observe the impact of residual connections and batch-normalization on the gradient norm at di erent depth in a residual network.

You can start this session with an embryo of code that includes an implementation of a residual network and an example of graph drawing with Matplotlib:

https://fleuret.org/dlc/src/dlc_practical_6_embryo.py

• Modi cation of the ResNet implementation

Edit the implementation of the ResNet and ResNetBlock so that you can pass two Boolean ags skip_connections and batch_normalization to specify if these features are activated or not.

• Monitoring the gradient norm

Write a function get_stats(skip_connections, batch_normalization) that

1. creates a model with 30 residual blocks, 10 channels, 3 3 kernels,

2. computes the norm of the gradient of the cross-entropy with respect to the weights of the rst convolutional layer of each residual block, on 100 individual samples,

3. returns the 30 100 resulting tensor.

Hint: You can create a list of the weight tensors of the rst convolution layer of each block with:

monitored_parameters = [ b.conv1.weight for b in model.resnet_blocks ]

and use it to get the gradient norm for each.

• Graph

Plot for the four con gurations of the two Boolean ags skip_connections and batch_normalization the average of the gradient norm vs. depth.

1 of 2

If you use a notebook, you can set the Maplotlib backend to the ’inline’ one to have graphs appear in it with

%matplotlib inline

2 of 2