Starting from:
$30

$24

GAN & Ridge regression

    • Implementation: GAN (55 pts)

In this part, you are expected to implement GAN with MNIST dataset. We have provided a base jupyter notebook (gan-base.ipynb) for you to start with, which provides a model setup and training configurations to train GAN with MNIST dataset.

    (a) Implement training loop and report learning curves and generated images in epochs 1, 50, and 100. Note that drawing learning curves and visualization of images are already implemented in provided jupyter notebook. (20 pts)


Procedure 1 Training GAN, modified from Goodfellow et al. (2014)


Input: m: real data batch size, nz: fake data batch size
Output: Discriminator D, Generator G

for number of training iterations do

    • Training discriminator
Sample minibatch of nz noise samples {z(1), z(2), · · · , z(nz)} from noise prior pg(z)
Sample minibatch of {x(1), x(2), · · · , x(m)}
Update the discriminator by ascending its stochastic gradient:

1  m
1
nz
∇θd


log D(x(i)) +

log(1 − D(G(z(i))))

m
i=1

nz






i=1

    • Training generator
Sample minibatch of nz noise samples {z(1), z(2), · · · , z(nz)} from noise prior pg(z)
Update the generator by ascending its stochastic gradient:

1
nz
∇θg

log D(G(z(i)))

nz



i=1

end for

    • The gradient-based updates can use any standard gradient-based learning rule. In the base code, we are using Adam optimizer (Kingma and Ba, 2014)


The expected results are as follows.







1
Homework 5    CS 760 Machine Learning
























Figure 1: Learning curve

















(a) epoch 1    (b) epoch 50    (c) epoch 100

Figure 2: Generated images by G

Solution goes here.

(b) Replace the generator update rule as the original one in the slide, “Update the generator by descending its stochastic gradient:”

1
nz
∇θg

log(1 − D(G(z(i))))  ,

nz



i=1

and report learning curves and generated images in epochs 1, 50, and 100. Compare the result with (a).

Note that it may not work. If training does not work, explain why it does not work.    (10 pts)

Solution goes here.

(c) Except for the method that we used in (a), how can we improve training for GAN? Implement that and

report learning curves and generated images in epochs 1, 50, and 100.    (10 pts)

Solution goes here.








2
Homework 5    CS 760 Machine Learning


    • Ridge regression [20 pts]

Derive the closed-form solution in matrix form for the ridge regression problem:


1

n






min


(z⊤β

y
)2
+ λ
β
2

n






∥A
β

i

i




i=1

where
∥β∥2A := β⊤Aβ
and


0  0  0
0
1
0
.
A =  0
0
1

This A matrix has the effect of NOT regularizing the bias β0, which is standard practice in ridge regression. Note:
Derive the closed-form solution, do not blindly copy lecture notes.

Solution goes here.

    • Review the change of variable in probability density function [25 pts]

In Flow based generative model, we have seen pθ(x) = p(fθ(x))|∂f∂xθ(x) |. As a hands-on (fixed parameter) exam-ple, consider the following setting.


Let X and Y be independent, standard normal random variables. Consider the transformation U = X + Y and V = X − Y . In the notation used above, U = g1(X, Y ) where g1(X, Y ) where g1(x, y) = x + y and V = g2(X, Y ) where g2(x, y) = x − y. The joint pdf of X and Y is fX,Y = (2π)−1exp(−x2/2)exp(−y2/2), −∞ <
x < ∞, −∞ < y < ∞. Then, we can determine u, v values by x, y, i.e.
u
1
1
x

v
=  1

1
y  .




















(a)
(5 pts) Compute Jacobian matrix



















J =
∂x

∂x































∂u

∂v















∂u

∂v















∂y

∂y



























(5 pts)





















Solution goes here.















(b)
(Forward) Show that the joint pdf of U, V is
















1


exp(−u2




1


exp(−v2/4)


fU,V (u, v) =





/4)












2






2



(Hint: fU,V (u, v) = fX,Y (?, ?)|det(J)|)














(10 pts)


















Solution goes here.















(c)
(Inverse) Check whether the following equation holds or not.








fX,Y (x, y) = fU,V (x + y, x − y)|det(J)−1|




(10 pts)

Solution goes here.

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.

(2014). Generative adversarial nets. Advances in neural information processing systems, 27.

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

3

More products