$24
Introduction
The objective of this session is to observe the impact of residual connections and batch-normalization on the gradient norm at di erent depth in a residual network.
You can start this session with an embryo of code that includes an implementation of a residual network and an example of graph drawing with Matplotlib:
https://fleuret.org/dlc/src/dlc_practical_6_embryo.py
• Modi cation of the ResNet implementation
Edit the implementation of the ResNet and ResNetBlock so that you can pass two Boolean ags skip_connections and batch_normalization to specify if these features are activated or not.
• Monitoring the gradient norm
Write a function get_stats(skip_connections, batch_normalization) that
1. creates a model with 30 residual blocks, 10 channels, 3 3 kernels,
2. computes the norm of the gradient of the cross-entropy with respect to the weights of the rst convolutional layer of each residual block, on 100 individual samples,
3. returns the 30 100 resulting tensor.
Hint: You can create a list of the weight tensors of the rst convolution layer of each block with:
monitored_parameters = [ b.conv1.weight for b in model.resnet_blocks ]
and use it to get the gradient norm for each.
• Graph
Plot for the four con gurations of the two Boolean ags skip_connections and batch_normalization the average of the gradient norm vs. depth.
1 of 2
If you use a notebook, you can set the Maplotlib backend to the ’inline’ one to have graphs appear in it with
%matplotlib inline
2 of 2