Homework #10 Solution

Starting from:

~~$24.99~~

$18.99

Home

1. [2 points] KL Divergence

(a) [1 point] What is the expression of the KL divergence DK L(q(x)||p(x)) given two con- tinuous distributions p(x) and q(x) defined on the domain of R1?

Your answer:

(b) [1 point] Show that the KL divergence is non-negative. You can use Jensen’s inequality here without proving it.

Your answer:

2. [3 points] In the class, we derive the following equality:

Z                    pθ (x, z)        Z

θ

qφ (z|x)

log pθ (x) =

qφ (z|x) log

q

z

z                             φ

dz + (z|x)

qφ (z|x) log p (z|x) dz

Instead of maximizing the log likelihood log pθ (x) w.r.t. θ, we find a lower bound for log pθ (x)

and maximize the lower bound.

(a) [1 point] Use the above equation and your result in 1(b) to give a lower bound for log pθ (x).

Your answer:

(b) [1 point] What do people usually call the bound?

Your answer:

(c) [1 point] In what condition will the bound be tight?

Your answer:

z

3. [2 points] Given z ∈ R1 , p(z) ∼ N (0, 1) and q(z|x) ∼ N (µz , σ2), write DK L (q(z|x)||p(z)) in

terms of σz and µz .

Your answer:

z

4. [1 points] In VAEs, the encoder computes the mean µz and the variance σ2 of qφ (z|x) as-

z

suming qφ (z|x) is Gaussian. Explain why we usually model σ2

in log space, i.e., modeling

2                             2

log σz instead of σz when implementing it using neural nets?

Your answer:

z

5. [1 points] Why do we need the reparameterization trick when training VAEs instead of di- rectly sampling from the latent distribution N (µz , σ2)?

Your answer: