Mini-project 2: Write your own blocks

Starting from:

~~$30~~

$24

The goal of the previous section was to exploit PyTorch to build a network. Here you’ll build your framework for denoising images without using autograd or torch.nn modules. We allow

1 from torch import empty , cat , arange

• from torch . nn . f u n c t i o n a l import fold , unfold

and the Python Standard Library1 only. We could consider additional operations if you demonstrate a convincing use-case (please ask on slack or by email to the TAs). Note that you can achieve most elementary tensor operations

• https://docs.python.org/3/library/

using the .foo() methods or mathematical operators, e.g. instead of torch.abs(a) or torch.sqrt(a) consider a.abs() or a ** .5.

Your code should work with autograd globally o , which can be achieved with

torch.set_grad_enabled(False)

Speci cally, you will implement the following blocks that you may have used in the previous problem.

Convolution layer.

Transpose convolution layer, or alternatively a combination of Nearest neighbor upsampling + Convolution.

Upsampling layer, which is usually implemented with transposed convolution, but you can alternatively use a combination of Nearest neighbor upsampling + Convolution for this mini-project.

ReLU

Sigmoid

A container like torch.nn.Sequential to put together an arbitrary con guration of modules together.

• Mean Squared Error as a Loss Function Stochastic Gradient Descent (SGD) optimizer

With these blocks, build the following network:

• S e q u e n t i a l ( Conv ( stride 2) ,

2
ReLU ,
3
Conv ( stride 2) ,
4
ReLU ,
5
Upsampling ,
6
ReLU ,
7
Upsampling ,
8
Sigmoid )

As speci ed before, you can implement Upsampling with Nearest neighbor upsampling + Convolution. Some suggestions to implement the Convolution layer are given in ??.

Training data

Reuse the .pkl les from the rst project.

Suggested structure to implement your modules

You are free to develop any new ideas you want, and grading will reward originality. However, we ask that your modules can be instantiated with similar arguments and in the same order as their PyTorch counterparts. The suggested simple structure is to de ne a class

• class Module ( object ) :

2
def
forward ( self , * input ) :
3

raise
N o t I m p l e m e n t e d E r r o r
4
def
backward ( self , * g r a d w r t o u t p u t ) :
5

raise
N o t I m p l e m e n t e d E r r o r

• def param ( self ) :

7
return []

and to implement several modules and losses that inherit from it.

Each such module may have tensor parameters, in which case it should also have for each a similarly sized gradient tensor to accumulate the gradients during the backward-pass, and

• forward should get for input and returns, a tensor or a tuple of tensors.

backward should get as input a tensor or a tuple of tensors containing the gradient of the loss with respect to the module’s output, accumulate the gradient wrt the parameters, and return a tensor or a tuple of tensors containing the gradient of the loss wrt the module’s input.

param should return a list of pairs composed of a parameter tensor and a gradient tensor of the same size. This list should be empty for parameterless modules (such as ReLU).

Some modules may require additional methods, and some modules may keep track of information from the forward pass to be used in the backward.

Evaluation and Submission

As in the rst mini-project, your submission for the second mini-project will be evaluated on its correctness and performance. Additionally, we will test each module’s implementation for correctness, using some kind of automated testing, where each module forward and backward passes will be assessed on custom inputs. Thus you must name each of your modules as Conv2d, TransposeConv2d or NearestUpsampling, ReLU, Sigmoid, MSE, SGD, Sequential.

Similar to model.py from mini-project 1, we expect the submission to have train and test functionality. You can use pickle format to save your model.

• ### For mini - project 2

• class Model () :

• def __init__ ( self ) -> None :

4
## i n s t a n t i a t e model + o pt im iz er + loss function + any other stuff you need
5
pass
6

7
def l o a d _ p r e t r a i n e d _ m o d e l ( self ) -> None :
8
## This loads the p a r a m e t e r s saved in b es tm od el . pth into the model
9
pass
10

11 def train ( self , train_input , train_target , n u m _ e p o c h s ) -> None :

12
#: t r a i n _ i n p u t : tensor
of size (N, C, H, W) c o n t a i n i n g a
noisy
version
of the
images
.

13
#: t r a i n _ t a r g e t : tensor
of size (N, C, H, W) c o n t a i n i n g
another
noisy
version
of the
same
images , which only differs from the input by their noise .

14
pass

15

16 def predict ( self , t e s t _ i n p u t ) -> torch . Tensor :

17
#: t e s t _ i n p u t : tensor
of size (N1
, C, H, W) with values in
range 0 -255 that has to
be
denoised by the trained or
the
loaded
network .

18
#: returns a tensor of
the
size (N1 , C, H,
W) with values
in range 0 -255.
19
pass

Additionally, each of your implemented modules needs to be accessible from the model.py le, so importing the di erent modules with from model import Conv2d, TransposeConv2d... should work. The test.py contains additional details and some sample testing methods that indicate how your code is going to be evaluated.

Details on Automated Testing To test your code, our testing framework will instantiate your modules with the same arguments as their PyTorch counterparts. It will then explicitly call the forward function of your module. The forward function should take as input a Tensor and return only the output Tensor. Note that if you need your forward function to return more than the output Tensor during training, you can use additional kwargs to adapt the return value. For the convolution module, we additionnaly ask you to have your class expose a weight and a bias parameter that matches to the one used in PyTorch. In the Model class, the predict method takes as input a Tensor with values in range 0-255 and should return a Tensor with values in range 0-255.

Saving the model For the purposes of saving the trained network’s state, you can choose to store each of the modules’ states in a pickle le. In the function load pretrained model, you can read each of those states

• Report and Submission details

Each mini-project should include a report of a maximum of 3 pages. The report will be graded based on clarity and completeness. It should include qualitative and quantitive results of the di erent approaches, explanations of your choices, and ablations justifying these choices. If you are unsure what constitutes a discussion of results, consult the original Noise2Noise paper linked at the beginning of this document for ideas. Comment on your failed experimental, architectural, implementation choices. Your code will be evaluated based on quality, clarity of implementation, and correctness.

• Appendix: Implementing Convolution

Much of the di culty of the second mini-project arises from coding up convolution. This optional appendix gives some hints on how to simplify this computation. If you do implement using the hints given here, you can additionally import fold and unfold from torch.nn.functional.

A.1 Convolutions can be seen as linear layers

If f : Rc h w ! Rc0 h0 w 0 represents the action of a bias-less convolution then it is straightforwardly seen2 that f
satis es the properties: (1) for any scalar a, f (ax) = af (x), and (2) f (x + y ) = f (x) + f (y ). For example:

1 import torch

2

• if __name__ == " __main__ " :

• k e r n e l _ s i z e = (2 , 2)

5

6
x
=
torch . randn ((1 ,
3
,
32 ,
32))
7
y
=
torch . randn ((1 ,
3
,
32 ,
32))
8
a
=
torch . randn ((1 ,) )

9

10 o u t _ c h a n n e l s = 4

11

12 conv = torch . nn . Conv2d ( i n _ c h a n n e l s = x . shape [1] ,

13
o u t _ c h a n n e l s = out_channels ,
14
k e r n e l _ s i z e = kernel_size ,
15
bias = False )
16

17 torch . testing . a s s e r t _ a l l c l o s e ( a * conv ( x ) , conv ( a * x ) )

18 torch . testing . a s s e r t _ a l l c l o s e ( conv ( x + y ) , conv ( x ) + conv ( y ) )

Mathematically, this means that convolution can be thought of as multiplication by a c0h0w 0 chw matrix after some reshaping. Thus, when developing a deep learning framework, if we already know how to apply a linear layer, we can implement a convolution layer by writing the convolution as a linear function, applying our logic for linear operations, and reinterpreting the output as a convolution. This is true of both the forward and backward operations.

Thus, most of the complications in implementing convolutions are due to the bookkeeping of treating convolution as a linear layer. The dimensionality of the output given the input and the settings (kernel size, etc.) of the convolutions can be computed using the formulae given in the shape section of https://pytorch.org/docs/ stable/generated/torch.nn.Conv2d.html.

Note that we have not discussed the bias term, this would likewise need to be reshaped and treated correctly. The idea is that if f is a convolution without bias, and atten(f (x)) = W atten(x) for some W 2 Rc0h0w 0 chw , then after attening, a convolution with bias can be written as W atten(x) + b for some b 2 Rc0h0w 0 .

• Conceptualize convolution as a fancy average.

A.2 Special operations : fold and unfold

Pytorch o ers two useful primitives to work with convolution as matrix operations, fold and unfold. unfold extracts patches (proximate pixel values) into columns. This operation is sometimes also called \im2col". For the speci c case of a 2 2 kernel, if test is

tensor([[[[
0.
,
1.
,
2.
,
3.
],
[
4.
,
5.
,
6.
,
7.
],
[
8.
,
9.
,
10.
,
11.
],

[
12.
,
13.
,
14.
,
15.
]]]])

then torch.nn.Unfold(kernel

size=2)(test) is

tensor([[[
0.
,
1.,
2.
,
4., 5., 6.,
8.
, 9.,
10.
],
[
1.
,
2.,
3.
,
5., 6., 7.,
9.
, 10.,
11.
],
[
4.
,
5.,
6.
,
8., 9., 10.,
12.
, 13.,
14.
],
[
5.
,
6.,
7.
,
9., 10., 11.,
13.
, 14.,
15.
]]])

where each column contains the values of a 2 x 2 patch. E.g., the rst column contains the values 0, 1, 4, and 5 since these are the four values in the upper left of test. Not all columns are colored, but you can see how the sliding and reshaping work. This can be used to implement convolution as matrix multiplication as in this example:

1 import torch

2

3 i n _ c h a n n e l s = 3

4 o u t _ c h a n n e l s = 4

• k e r n e l _ s i z e = (2 , 3)

6

7
conv
= torch . nn . Conv2d ( in_channels ,
out_channels , k e r n e l _ s i z e )
8

9
x =
torch . randn ((1 , in_channels , 32 ,
32) )
10

11 # Output of PyTorch c o n v o l u t i o n

12 expected = conv ( x )

13

14
# Output of c o n v o l u t i o n as a matrix product
15
unfolded = torch . nn . f u n c t i o n a l . unfold (x ,
k e r n e l _ s i z e = k e r n e l _ s i z e )
16
wxb =
conv . weight . view ( out_channels , -1)
@ unfolded + conv . bias . view (1 , -1 , 1)
17
actual
= wxb . view (1 , out_channels , x . shape [2] - k e r n e l _ s i z e [0] + 1 , x . shape [3] - k e r n e l _ s i z e [1]

+
1)

18

19 torch . testing . a s s e r t _ a l l c l o s e ( actual , expected )

Note that this code assumes batch size=1, but your code should work with arbitrary batch sizes. This shows to evaluate the forward pass of a convolution as a linear operation, evaluating the backwards pass of a convolution as the backwards pass of a linear operation follows by analogy.

fold is essentially the inverse of unfold that contains an additional term related to how many values were included in a patch. For more on the precise relationship, consult the PyTorch documentation at https://pytorch.org/ docs/stable/generated/torch.nn.Fold.html.

7