$29
Task: Work through each set of exercises. If you get stuck on any of the exercises you can ask Yi or myself for help by email or during o ce hours.
What to submit: Submit your answers for all of the exercises in this document to the appropriate dropbox on the Carmen site. Answers for the concept check and proof sections can be hand-written (e.g., submitted as a scanned image), but please make sure that your writing is readable. Answers to the coding section must be written in python and must be runnable by the grader.
Due date: Submit your answers to the Carmen dropbox by 11:59pm, Jun. 27th.
Concept check
1. (2pt) Using the alarm network on slide 3 of the Bayesian Inference slides, compute P(B j + j; +m).
2. (3pt) Refer to the Naive Bayes Classifier shown below. Suppose C has domain fc1; c2; c3g and each Xi is a Boolean variable with values true and f alse. Using the Bayesian net G, compute the following distribution, show-ing the manner in which you derived your answer.
P(C j X1 = f alse; X2 = true; X3 = f alse):
3. (3pt) The sigmoid function
1
s(z) = 1 + e z
1
C :: P (c1)
P (c2)
G
X2 :: P (true|c1)
P (true|c2)
P (true|c3)
C
0.3
0.5
0.9
0.5
0.7
X1 :: P (true|c1)
P (true|c2)
P (true|c3)
X3 :: P (true|c1)
P (true|c2)
P (true|c3)
0.7
0.4
0.2
X2
0.6
0.4
0.2
X1
X3
Figure 1: A Naive Bayes Classifier.
has derivative s0(z) = s(z)(1 s(z)). Moreover, recall that during backpro-pogation the derivative s0(z) is a factor in the gradient computation used to update the weights of a multilayer perceptron (see slides 28-30 in the neural-nets.pdf slide set). Activation functions like sigmoid have a “satura-tion” problem: when z is very large or very small, s(z) is close to 1 or 0, respectively, and so s0(z) is close to 0. As a result, corresponding gradients will be nearly 0, which slows down training. A ne activation functions with positive slope always have a positive derivative and thus will (more or less) not exhibit saturation, but they have other drawbacks (think back to lab 6). Do a little research and find a non-a ne activation function that avoids the saturation problem (hint: ReLU). In your own words, describe how this activation is non-a ne and also avoids the saturation problem. Briefly dis-cuss any drawbacks your chosen activation function may have, as well as similar alternatives that avoid these drawbacks.
Coding
1. (8pt) Implement in python a convolutional layer (without identity activa-tion) that computes the application of the 3x3 vertical and horizontal sobel masks below to an input image of size 5x5x3 with zero-padding of size 1. That is, the weights of your convolutional layer will not be learned, but rather hard-coded to match the values of the filters. To make things con-crete, use the input volume and masks below:
3
1
3
8
2
5
4
1
3
8
2
2
3
7
3
4
1
5
7
9
4
9
1
4
7
6
9
4
4
5
input volume:
2
1
4
5
0
7
3
1
4
6
1
1
1
1
1
4
1
5
8
3
8
4
1
5
2
8
3
4
5
5
3
1
4
7
2
2
3
1
8
2
7
2
3
1
4
2
1
0
1
1
2
1
vertical mask:
2
0
2
horizontal mask: 0
0
0
1
0
1
1
2
1
2. (9 pt) Implement a perceptron that can learn the Boolean function AND using the threshold activation function.
Fun with proofs
1. (5pt) Prove that a multilayer perceptron with one hidden layer with two neurons and output layer with one neuron is an a ne function of the input if the activation function for each neuron is an a ne function. To make things simple and concrete, you need only demonstrate the result for the mlp show below. Briefly explain the implications of this result for using multilayer perceptrons with a ne activation functions to learn the XOR data.
Figure 2: Multilayer Perceptron
3