$29
• WRITTEN EXERCISES
1. Maximum-margin classifiers Suppose we are given n ˘ 7 observa-tions in p ˘ 2 dimensions. For each observation, there is an associ-ated class label.
a. Sketch the observations and the maximum-margin separating hyperplane.
b. Describe the classification rule for the maximal margin classi-fier. It should be something along the lines of “Classify to Red if fl0 ¯fl1 X1 ¯fl2 X2 ¨ 0, and classify to Blue otherwise.” Provide the values for fl0, fl1, and fl2.
X1
X2
Y
3
4
Red
2
2
Red
4
4
Red
1
4
Red
2
1
Blue
4
3
Blue
4
1
Blue
c. On your sketch, indicate the margin for the maximal margin hyperplane.
d. Indicate the support vectors for the maximal margin classifier.
e. Argue that a slight movement of the seventh observation would not affect the maximal mar-gin hyperplane.
CS5785 Fall 2019: Homework 4 Page 2
f. Sketch a hyperplane that separates the data, but is not the maximum-margin separating hy-perplane. Provide the equation for this hyperplane.
g. Draw an additional observation on the plot so that the two classes are no longer separable by a hyperplane.
2. Neural networks as function approximators. Design a feed-forward neural network to approx-imate the 1-dimensional function given in Fig. 1. The output should match exactly. How many hidden layers do you need? How many units are there within each layer? Show the hidden layers, units, connections, weights, and biases.
6
"Pride Rock"
5
4
Output
3
2
1
0
10
1
2
3
4
5
6
7
8
9
10
Input
Figure 1: Example function to approximate using a neural network.
Use the ReLU nonlinearity for every unit. Every possible path from input to output must pass through the same number of layers. This means each layer should have the form
Yi
˘ ¾(Wi YiT¡1 ¯fli ),
(1)
where Yi 2 Rdi £1 is the output of the i th layer, Wi 2 Rdi £di ¡1
is the weight matrix for that layer,
Y0 ˘ x 2 R1£1, and the ReLU nonlinearity is defined as
¢
x
x ‚ 0,
¾(x)
˘
(0
otherwise .
(2)
Hint 1: Try to express the input function as a sum of weighted and scaled ReLU units. Once you can do this, it should be straightforward to turn your final equation into a series of neural network layers with individual weights and biases.
Hint 2: If you’re not convinced neural networks can approximate this function,
see http://neuralnetworksanddeeplearning.com/chap4.html for an example of the flavor of solution we’re looking for. (If you rely on this source, cite it!) However, keep in mind that your output must match the given input exactly.
Ordinarily, such a network could be learned via backpropagation, but we’re instead asking you to figure out the weights to this simple network by hand.
2
CS5785 Fall 2019: Homework 4 Page 3
◦ PROGRAMMING EXERCISES
1. Approximating images with neural networks. In this question, you will implement your own neural network toolkit. You will be writing your own implementation from scratch, using C++ and CUDA. You should calculate the derivatives of each layer by hand using pencil and paper. Please attach a scan of your paper notes to the homework.
Just kidding. We’re not that mean. There are several good convolutional neural network packages that have done the heavy lifting for us. One of the more interesting (and well-written!) demos is called CONVNETJS. It is implemented in Javascript and runs in a modern web browser without any dependencies.
Take a look at convnet.js’s “Image Painting” demo at: http://cs.stanford.edu/people/karpathy/
convnetjs/demo/image_regression.html
a. Describe the structure of the network. How many layers does this network have? What is the purpose of each layer?
b. What does “Loss” mean here? What is the actual loss function? You may need to consult the source code, which is available on Github.
c. Plot the loss over time, after letting it run for 5,000 iterations. How good does the network eventually get?
d. Can you make the network converge to a lower loss function by lowering the learning rate every 1,000 iterations? (Some learning rate schedules, for example, halve the learning rate every n iterations. Does this technique let the network converge to a lower training loss?)
e. Lesion study. The text box contains a small snippet of Javascript code that initializes the net-work. You can change the network structure by clicking the “Reload network” button, which simply evaluates the code. Let’s perform some brain surgery: Try commenting out each layer, one by one. Report some results: How many layers can you drop before the accuracy drops below a useful value? How few hidden units can you get away with before quality drops no-ticeably?
f. Try adding a few layers by copy+pasting lines in the network definition. Can you noticeably increase the accuracy of the network?
2. Random forests for image approximation In this question, you will use random forest regression to approximate an image by learning a function, f : R2 ! R , that takes image (x, y) coordinates as input and outputs pixel brightness. This way, the function learns to approximate areas of the image that it has not seen before.
3
CS5785 Fall 2019: Homework 4 Page 4
a. Start with an image of the Mona Lisa. If you don’t like the Mona Lisa, pick another interesting image of your choice.
Figure 2: Left: http://tinyurl.com/mona-lisa-small Mona Lisa, Leonardo da Vinci, via Wikipedia. Licensed under Public Domain. Middle: Example output of a decision tree regressor. The input is a “feature vector” containing the (x, y) coordinates of the pixel. The output at each point is an (r, g , b) tuple. This tree has a depth of 7. Right: Example output of a k-NN regressor, where k ˘ 1. The output at each pixel is equal to its closest sample from the training set.
b. Preprocessing the input. To build your “training set,” uniformly sample 5,000 random (x, y) coordinate locations.
◦ What other preprocessing steps are necessary for random forests inputs? Describe them, implement them, and justify your decisions. In particular, do you need to perform mean subtraction, standardization, or unit-normalization?
c. Preprocessing the output. Sample pixel values at each of the given coordinate locations. Each pixel contains red, green, and blue intensity values, so decide how you want to handle this. There are several options available to you:
◦ Convert the image to grayscale
◦ Regress all three values at once, so your function maps (x, y) coordinates to (r, g , b) val-ues: f : R2 ! R3
◦ Learn a different function for each channel, fRed : R2 ! R, and likewise for fGr een , fBl ue .
Note that you may need to rescale the pixel intensities to lie between 0.0 and 1.0. (The de-fault for pixel values may be between 0 and 255, but your image library may have different defaults.)
What other preprocessing steps are necessary for random regression forest outputs? Describe them, implement them, and justify your decisions.
d. To build the final image, for each pixel of the output, feed the pixel coordinate through the random forest and color the resulting pixel with the output prediction. You can then use imshow to view the result. (If you are using grayscale, try imshow(Y, cmap=’gray’) to avoid fake-coloring). You may use any implementation of random forests, but you should under-stand the implementation and you must cite your sources.
e. Experimentation.
4
CS5785 Fall 2019: Homework 4 Page 5
i. Repeat the experiment for a random forest containing a single decision tree, but with depths 1, 2, 3, 5, 10, and 15. How does depth impact the result? Describe in detail why.
ii. Repeat the experiment for a random forest of depth 7, but with number of trees equal to 1, 3, 5, 10, and 100. How does the number of trees impact the result? Describe in detail why.
iii. As a simple baseline, repeat the experiment using a k-NN regressor, for k ˘ 1. This means that every pixel in the output will equal the nearest pixel from the “training set.” Compare and contrast the outlook: why does this look the way it does?
iv. Experiment with different pruning strategies of your choice.
f. Analysis.
i. What is the decision rule at each split point? Write down the 1-line formula for the split point at the root node for one the trained decision trees inside the forest. Feel free to define any variables you need.
ii. Why does the resulting image look like the way it does? What shape are the patches of color, and how are they arranged?
iii. Straightforward: How many patches of color may be in the resulting image if the forest contains a single decision tree? Define any variables you need.
iv. Tricky: How many patches of color might be in the resulting image if the forest contains n decision trees? Define any variables you need.
5