$29
Provide credit to any sources other than the course sta that helped you solve the problems. This includes all students you talked to regarding the problems.
You can look up de nitions/basics online (e.g., wikipedia, stack-exchange,
Submission rules are the same as previous assignments.
Problem 1. (15 points). Consider one layer of a ReLU network. The feature vector is d di-
!
mensional x . The linear transformation is a m
d dimensional matrix W . The output of the
W x
. This is a component-wise max
ReLU network is a m dimensional vector y given by maxf0; ! g
function.
!
Suppose x is xed, and all its entries are non-zero.
Suppose the entries in W are all independent, and distributed accoding to a Gaussian distri-bution with mean 0, and standard deviation 1 (a N(0; 1) distribution).
1. Show that the expected number of non-zero entries in the output is m=2.
2. Suppose!jx j22 = 2, what is the distribution of each entry in W x (the output before applying ReLU function)?
3. What is the mean of each entry in y (after ReLU function)?
Problem 2. (10 points). Consider the setting as in the previous problem, with m = 2, and
d = 2. Let
12 3 !;x
23 :
W =
2
• o
! !
Consider the function L = max (W(1) x ); (W(2) x ) , where is the Sigmoid function and W(i) denotes the ith row of W . Please draw the computational graph for this function, and compute the gradients (which will be Jacobians at some nodes!).
Problem 3. (10 points). Given inputs z1; z2, the softmax function is the following:
ez1
y^ = ez1 + ez2 :
1
Let y 2 f0; 1g, then de ne the cross-entropy loss between y and y^ be
L(y; y^) = y log(^y)
(1 y) log(1
y^):
Prove that:
@L(y; y^)
= y^ y;
@L(y; y^)
= y
y:^
@z1
@z2
Problem 4. (15 points). Consider datapoints in Figure 1: (
2; 0); (2; 0) are crosses, and (0; 2); (0; 2)
are circles. Let the crosses be labeled +1, and the circles be labeled 1. In this problem the goal
4
2
4
2
2
4
•
4
Figure 1: Neural Networks
is to design a neural network with no error on this dataset.
To make things simple, consider the following generalization. We rst append a +1 to each input and form a new dataset as follows: ( 2; 0; 1); (2; 0; 1) are labeled +1, and (0; 2; 1); (0; 2; 1) are labeled 1. Note that the last feature is redundant.
We consider the following basic units for our neural networks: Linear transformation followed by hard thresholding. Each unit has three parameters w1; w2; w3. The output of the unit is the sign of the inner product of the parameters with the input.
1. Design a neural network with these units that make no error on the datapoints above. (Hint: You can take two units in the rst layer, and one in the output layer, a total of three units).
2
2. Show that if you design a neural network with ONLY one such unit, then the points cannot be all classi ed correctly.
Problem 5. (40 points). See attached notebook for details.
3