Starting from:
$24.99

$18.99

Homework #10 Solution

1.  [2 points] KL Divergence

 

(a)  [1 point]  What  is the  expression  of the  KL divergence  DK L(q(x)||p(x))  given two con- tinuous  distributions p(x) and q(x) defined on the domain  of R1?

 

Your answer:

(b)  [1 point] Show that the KL divergence is non-negative. You can use Jensen’s inequality here without  proving it.

 

Your answer:

 

2.  [3 points] In the class, we derive the following equality:

Z                     pθ (x, z)         Z


 



θ
 
qφ (z|x)

log pθ (x) =


qφ (z|x) log



q
 


z
 
z                              φ


dz + (z|x)


qφ (z|x) log p (z|x) dz

 

Instead  of maximizing the log likelihood log pθ (x) w.r.t.  θ, we find a lower bound for log pθ (x)

and maximize the lower bound.

 

 

(a)  [1 point]  Use the  above  equation  and  your  result  in  1(b)  to  give a  lower  bound  for log pθ (x).

Your answer:

(b)  [1 point] What  do people usually  call the bound?

 

Your answer:

(c)  [1 point] In what  condition  will the bound  be tight?

 

Your answer:

 



z
 
3.  [2 points] Given z ∈ R1 , p(z) ∼ N (0, 1) and q(z|x) ∼ N (µz , σ2), write DK L (q(z|x)||p(z))  in

terms  of σz  and µz .

 

Your answer:

 



z
 
4.  [1 points] In VAEs,  the  encoder  computes  the  mean  µz  and  the  variance  σ2  of qφ (z|x)  as-



z
 
suming  qφ (z|x)  is Gaussian.   Explain  why we usually  model σ2


in log space,  i.e., modeling

2                              2

log σz  instead  of σz  when implementing it using neural  nets?

 

Your answer:

 



z
 
5.  [1 points] Why do we need the reparameterization trick  when training  VAEs instead  of di- rectly sampling  from the latent distribution N (µz , σ2)?

 

Your answer:

More products