Starting from:


Homework #10 Solution

1.  [2 points] KL Divergence


(a)  [1 point]  What  is the  expression  of the  KL divergence  DK L(q(x)||p(x))  given two con- tinuous  distributions p(x) and q(x) defined on the domain  of R1?


Your answer:

(b)  [1 point] Show that the KL divergence is non-negative. You can use Jensen’s inequality here without  proving it.


Your answer:


2.  [3 points] In the class, we derive the following equality:

Z                     pθ (x, z)         Z


qφ (z|x)

log pθ (x) =

qφ (z|x) log


z                              φ

dz + (z|x)

qφ (z|x) log p (z|x) dz


Instead  of maximizing the log likelihood log pθ (x) w.r.t.  θ, we find a lower bound for log pθ (x)

and maximize the lower bound.



(a)  [1 point]  Use the  above  equation  and  your  result  in  1(b)  to  give a  lower  bound  for log pθ (x).

Your answer:

(b)  [1 point] What  do people usually  call the bound?


Your answer:

(c)  [1 point] In what  condition  will the bound  be tight?


Your answer:


3.  [2 points] Given z ∈ R1 , p(z) ∼ N (0, 1) and q(z|x) ∼ N (µz , σ2), write DK L (q(z|x)||p(z))  in

terms  of σz  and µz .


Your answer:


4.  [1 points] In VAEs,  the  encoder  computes  the  mean  µz  and  the  variance  σ2  of qφ (z|x)  as-

suming  qφ (z|x)  is Gaussian.   Explain  why we usually  model σ2

in log space,  i.e., modeling

2                              2

log σz  instead  of σz  when implementing it using neural  nets?


Your answer:


5.  [1 points] Why do we need the reparameterization trick  when training  VAEs instead  of di- rectly sampling  from the latent distribution N (µz , σ2)?


Your answer:

More products