$29
Introduction
In this assignment, you’ll get hands-on experience coding and training GANs. This assignment is divided into two parts: in the rst part, we will implement a speci c type of GAN designed to process images, called a Deep Convolutional GAN (DCGAN). We’ll train the DCGAN to generate emojis from samples of random noise. In the second part, we will implement a more complex GAN architecture called CycleGAN, which was designed for the task of image-to-image translation (described in more detail in Part 2). We’ll train the CycleGAN to convert between Apple-style and Windows-style emojis.
In both parts, you’ll gain experience implementing GANs by writing code for the generator, discriminator, and training loop, for each model.
Part 1: Deep Convolutional GAN (DCGAN)
For the rst part of this assignment, we will implement a Deep Convolutional GAN (DCGAN). A DCGAN is simply a GAN that uses a convolutional neural network as the discriminator, and a network composed of transposed convolutions as the generator. To implement the DCGAN, we need to specify three things: 1) the generator, 2) the discriminator, and 3) the training procedure. We will develop each of these three components in the following subsections.
Implement the Discriminator of the DCGAN [10%]
The discriminator in this DCGAN is a convolutional neural network that has the following archi-tecture:
• https://markus.teach.cs.toronto.edu/csc321-2018-01
• http://cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf
1
CSC421
Programming Assignment 4
Discriminator
BatchNorm & ReLU
BatchNorm & ReLU
BatchNorm & ReLU
32
16
8
4
1
8
4
1
16
1
64
128
32
32
conv1
conv2
conv3
conv4
3
1. Padding: In each of the convolutional layers shown above, we downsample the spatial di-mension of the input volume by a factor of 2. Given that we use kernel size K = 5 and stride S = 2, what should the padding be? Write your answer in your writeup, and show your work (e.g., the formula you used to derive the padding).
2. Implementation: Implement this architecture by lling in the __init__ method of the DCDiscriminator class, shown below. Note that the forward pass of DCDiscriminator is already provided for you.
Note: The function conv in Helper Modules has an optional argument batch_norm: if batch_norm is False, then conv simply returns a torch.nn.Conv2d layer; if batch_norm is True, then conv returns a network block that consists of a Conv2d layer followed by a torch.nn.BatchNorm2d layer. Use the conv function in your implementation.
Generator [10%]
Now, we will implement the generator of the DCGAN, which consists of a sequence of transpose convolutional layers that progressively upsample the input noise sample to generate a fake image. The generator has the following architecture:
Generator
BatchNorm & ReLU
BatchNorm & ReLU
BatchNorm & ReLU
tanh
32
1
4
8
16
100
1
4
8
16
128
64
32
Linear &
upconv1
upconv2
32
upconv3
reshape
3
1. Implementation: Implement this architecture by lling in the __init__ method of the DCGenerator class, shown below. Note that the forward pass of DCGenerator is already provided for you.
Note: The original DCGAN generator uses deconv function to expand the spatial dimension. Odena et al. later found the deconv creates checker board artifacts in the generated samples. In this assignment, we will use upcome that consists of an upsampling layer followed by conv2D to replace the deconv module (analogous to the conv function used for the discriminator above) in your generator implementation.
2
CSC421 Programming Assignment 4
Training Loop [15%]
Next, you will implement the training loop for the DCGAN. A DCGAN is simply a GAN with a speci c type of generator and discriminator; thus, we train it in exactly the same way as a standard GAN. The pseudo-code for the training procedure is shown below. The actual implementation is simpler than it may seem from the pseudo-code: this will give you practice in translating math to code.
Algorithm 1 GAN Training Loop Pseudocode
1: procedure TrainGAN
2: Draw m training examples fx(1); : : : ; x(m)g from the data distribution pdata
3: Draw m noise samples fz(1); : : : ; z(m)g from the noise distribution pz
4: Generate fake images from the noise: G(z(i)) for i 2 f1; : : : :mg
5: Compute the (least-squares) discriminator loss:
J(D) =
1 m
D(x(i)) 1
2
+
1 m
D(G(z(i)))
2
2m i=1
2m i=1
X
X
6: Update the parameters of the discriminator
7: Draw m new noise samples fz(1); : : : ; z(m)g from the noise distribution pz
8: Generate fake images from the noise: G(z(i)) for i 2 f1; : : : :mg
9: Compute the (least-squares) generator loss:
J(G) =
1 m
D(G(z(i))) 1
2
m i=1
X
10: Update the parameters of the generator
1. Implementation: Fill in the gan_training_loop function in the GAN section of the note-book.
There are 5 numbered bullets in the code to ll in for the discriminator and 3 bullets for the generator. Each of these can be done in a single line of code, although you will not lose marks for using multiple lines.
Experiment [10%]
1. We will train a DCGAN to generate Windows (or Apple) emojis in the Training - GAN section of the notebook. By default, the script runs for 5000 iterations, and should take approximately 10 minutes on Colab. The script saves the output of the generator for a xed noise sample every 200 iterations throughout training; this allows you to see how the generator improves over time. You can stop the training after obtaining satisfactory image samples. Include in your write-up one of the samples from early in training (e.g., iteration 200) and one of the samples from later in training, and give the iteration number for those samples. Brie y comment on the quality of the samples, and in what way they improve through training.
3
CSC421 Programming Assignment 4
Part 2: CycleGAN
Now we are going to implement the CycleGAN architecture.
Motivation: Image-to-Image Translation
Say you have a picture of a sunny landscape, and you wonder what it would look like in the rain. Or perhaps you wonder what a painter like Monet or van Gogh would see in it? These questions can be addressed through image-to-image translation wherein an input image is automatically converted into a new image with some desired appearance.
Recently, Generative Adversarial Networks have been successfully applied to image translation, and have sparked a resurgence of interest in the topic. The basic idea behind the GAN-based approaches is to use a conditional GAN to learn a mapping from input to output images. The loss functions of these approaches generally include extra terms (in addition to the standard GAN loss), to express constraints on the types of images that are generated.
A recently-introduced method for image-to-image translation called CycleGAN is particularly interesting because it allows us to use un-paired training data. This means that in order to train it to translate images from domain X to domain Y , we do not have to have exact correspondences between individual images in those domains. For example, in the paper that introduced CycleGANs, the authors are able to translate between images of horses and zebras, even though there are no images of a zebra in exactly the same position as a horse, and with exactly the same background, etc.
Thus, CycleGANs enable learning a mapping from one domain X (say, images of horses) to another domain Y (images of zebras) without having to nd perfectly matched training pairs.
To summarize the di erences between paired and un-paired data, we have:
Paired training data: f(x(i); y(i))gNi=1 Un-paired training data:
{ Source set: fx(i)gNi=1 with each x(i) 2 X
{ Target set: fy(j)gMj=1 with each y(j) 2 Y
{ For example, X is the set of horse pictures, and Y is the set of zebra pictures, where there are no direct correspondences between images in X and Y
Emoji CycleGAN
Now we’ll build a CycleGAN and use it to translate emojis between two di erent styles, in partic-ular, Windows $ Apple emojis.
Generator [20%]
The generator in the CycleGAN has layers that implement three stages of computation: 1) the rst stage encodes the input via a series of convolutional layers that extract the image features; 2) the second stage then transforms the features by passing them through one or more residual blocks; and 3) the third stage decodes the transformed features using a series of transpose convolutional layers, to build an output image of the same size as the input.
The residual block used in the transformation stage consists of a convolutional layer, where the input is added to the output of the convolution. This is done so that the characteristics of the output image (e.g., the shapes of objects) do not di er too much from the input.
4
CSC421
Programming Assignment 4
GYtoXGXtoY
Apple
GXtoY
Windows
GYtoX
Apple
Emoji
Emoji
Emoji
conv
upconv
conv
upconv
conv
[0, 1]
Does the generated
image look like it came
DY
from the set of Windows
emojis?
Implement the following generator architecture by completing the __init__ method of the CycleGenerator class.
To do this, you will need to use the conv and upconv functions, as well as the ResnetBlock class, all provided in Helper Modules.
CycleGAN Generator
BatchNorm & ReLU BatchNorm & ReLU BatchNorm & ReLU BatchNorm & ReLU tanh
32
32
16
8
8
16
16
8
8
16
64
64
32
32
32
Redidual block
upconv1
32
conv1
conv2
upconv2
3 3
Note: There are two generators in the CycleGAN model, GX!Y and GY !X , but their imple-mentations are identical. Thus, in the code, GX!Y and GY !X are simply di erent instantiations of the same class.
CycleGAN Training Loop [20%]
Finally, we will implement the CycleGAN training procedure, which is more involved than the procedure in Part 1.
5
CSC421 Programming Assignment 4
Algorithm 2 CycleGAN Training Loop Pseudocode
1: procedure TrainCycleGAN
2: Draw a minibatch of samples fx(1); : : : ; x(m)g from domain X
3: Draw a minibatch of samples fy(1); : : : ; y(m)g from domain Y
4: Compute the discriminator loss on real images:
(D)
1
m
1
n
Xi
X
Jreal
= m
=1
(DX (x(i)) 1)2 + n
(DY (y(j) 1)2
j=1
Compute the discriminator loss on fake images:
m
Jfake(D) = m1 X(DY (GX!Y (x(i))))2 +
i=1
Update the discriminators
Compute the Y ! X generator loss:
n
J (GY !X) = n1 X(DX (GY !X (y(j)))
j=1
Compute the X ! Y generator loss:
m
J (GX!Y ) = m1 X(DY (GX!Y (x(i)))
i=1
n
• X(DX (GY !X (y(j))))2
n
j=1
1)2 + cycleJcycle(Y!X!Y )
1)2 + cycleJcycle(X!Y !X)
9: Update the generators
Similarly to Part 1, this training loop is not as di cult to implement as it may seem. There is a lot of symmetry in the training procedure, because all operations are done for both X ! Y and
• ! X directions. Complete the cyclegan_training_loop function, starting from the following section:
There are 5 bullet points in the code for training the discriminators, and 6 bullet points in total for training the generators. Due to the symmetry between domains, several parts of the code you ll in will be identical except for swapping X and Y ; this is normal and expected.
Cycle Consistency
The most interesting idea behind CycleGANs (and the one from which they get their name) is the idea of introducing a cycle consistency loss to constrain the model. The idea is that when we translate an image from domain X to domain Y , and then translate the generated image back to domain X, the result should look like the original image that we started with.
The cycle consistency component of the loss is the L1 distance between the input images and their reconstructions obtained by passing through both generators in sequence (i.e., from domain X to Y via the X ! Y generator, and then from domain Y back to X via the Y ! X generator). The cycle consistency loss for the Y ! X ! Y cycle is expressed as follows:
cycleJcycle(X!Y !X)
1
m
= cycle
ky(i) GX!Y (GY !X (y(i)))k1;
m
Xi
=1
6
CSC421 Programming Assignment 4
where cycle is a scalar hyper-parameter balancing the two loss terms: the cycle consistant loss and the GAN loss. The loss for the X ! Y ! X cycle is analogous.
Implement the cycle consistency loss by lling in the following section in CycleGAN training loop.
Note that there are two such sections, and their implementations are identical except for swapping
X and Y . You must implement both of them.
CycleGAN Experiments [15%]
1. Train the CycleGAN to translate Apple emojis to Windows emojis in the Training - CycleGAN section of the notebook. The script will train for 10,000 iterations, and saves generated samples in the samples_cyclegan folder. In each sample, images from the source domain are shown with their translations to the right.
Include in your writeup the samples from both generators at either iteration 200 and samples from a later iteration.
2. Change the random seed and train the CycleGAN again. What are the most noticible dif-ference between the similar quality samples from the di erent random seeds? Explain why there is such a di erence?
3. Changing the default lambda_cycle hyperparameters and train the CycleGAN again. Try a couple of di erent values including without the cycle-consistency loss. (i.e. lambda_cycle = 0)
For di erent values of lambda_cycle, include in your writeup some samples from both gen-erators at either iteration 200 and samples from a later iteration. Do you notice a di erence between the results with and without the cycle consistency loss? Write down your observa-tions (positive or negative) in your writeup. Can you explain these results, i.e., why there is or isn’t a di erence among the experiments?
What you need to submit
Your code le: cycle_gan.ipynb.
A PDF document titled a4-writeup.pdf containing samples generated by your DCGAN and CycleGAN models, and your answers to the written questions.
Further Resources
For further reading on GANs in general, and CycleGANs in particular, the following links may be useful:
1. Deconvolution and Checkerboard Artifacts (Odena et al., 2016)
2. Unpaired image-to-image translation using cycle-consistent adversarial networks (Zhu et al., 2017)
3. Generative Adversarial Nets (Goodfellow et al., 2014)
4. An Introduction to GANs in Tensor ow
5. Generative Models Blog Post from OpenAI
6. O cial PyTorch Implementations of Pix2Pix and CycleGAN
7