$20.99
1. [8 points] Backpropagation
Consider the deep net in the figure below consisting of an input layer, an output layer, and a hidden layer. The feed-forward computations performed by the deep net are as follows: every input ai is multiplied by a set of fully-connected weights uij connecting the input layer to the hidden layer. The resulting weighted signals are then summed and combined with a
bias ej . This results in the activation signal zj = ej + Pi ai uij . The hidden layer applies
activation function g on zj resulting in the signal bj . In a similar fashion, the hidden layer
activation signals bj are multiplied by the weights connecting the hidden layer to the output
layer wjk , a bias fk is added and the resulting signal hk is transformed by the output activation
function g to form the network output ck . The loss between the desired target tk and the
2
output ck is given by the MSE: E = 1 Pk
(ck − tk )2, where tk denotes the ground truth signal
corresponding to ck . Training a neural network involves determining the set of parameters
θ = {U, W, e, f} that minimize E. This problem can be solved using gradient descent, which
∂θ
requires determining ∂E
for all θ in the model.
1+e−x
(a) For g(x) = σ(x) = 1 , compute the derivative g0(x) of g(x) as a function of σ(x).
Your answer:
∂hk
(b) We denote by δk = ∂E
the error signal of neuron k in the second linear layer of the
network. Compute δk as a function of ck , tk , g0 and hk .
Your answer:
∂wjk
(c) Compute ∂E . Use δk and bj .
Your answer:
∂fk
(d) Compute ∂E . Use δk .
Your answer:
∂zj
(e) We denote by ψj = ∂E
the error signal of neuron j in the first linear layer of the network.
Compute ψj as a function of δk , wjk , g0 and zj . Your answer:
∂uij
(f ) Compute ∂E . Use ψj and ai .
Your answer:
∂ej
(g) Compute ∂E . Use ψj .
Your answer: