Homework 4 Solution

Starting from:

~~$45~~

$39

Home

Homework 4 Solution

(30 points) Consider the following Multilayer Perceptron (MLP) for binary classi cation,

!

$0 $1 $2

"0=1 "1 "2

%1 %2

#0=1 #1 #2

we have the following error function:

E(w1; w2; vjX) =
Xt
rt log yt + (1 rt) log(1 yt);
wheret
yt = sigmoid(t
v2 z2 +vt
1 z1 +v0), z1t = sigmoid(w1;2x2t +w1;1x1t +w1;0) and
z2 = LReLU(w2;2x2
+ w2;1x1
+ w2;0), the leaky recti ed linear unit LReLU(x)
de ned as

Instructor: Rui Kuang (kuang@cs.umn.edu). TAs: Jungseok Hong (jungseok@umn.edu) and Ujval Bangalore Umesh (banga038@umn.edu).

1

(

LReLU(x) =

0:01x; for x < 0

x; otherwise

Derive the equations for updating fw1; w2; vg of the above MLP.

Now, consider shared weights w = w1 = w2. Derive the equations for updating fw; vg .

Hint: Read Section 11.7.2 to see how Equations 11.23 and 11.24 are derived from Equation 11.22

0:01;
for x < 0

Hint 2: LReLU0
(x) = (1;
otherwise
.
(40 points) Implement a Multilayer Perceptron (MLP) with stochastic gra-dient descent to classify the optical-digit data. Train your MLPs on the \optdigits train.txt" data, tune the number of hidden units using the \opt-digits valid.txt" data, and test the prediction performance using the \optdig-its test.txt" data. (Read the submission instruction carefully to prepare your submission les.)

Implement a MLP with 1 hidden layer using the ReLU activation function:

ReLU(x) =

(

0; for x < 0

x; otherwise

Use the MLP for classifying the 10 digits. Read the algorithm in Figure 11.11 and section 11.7.3 in the textbook. When using the ReLU activation function, the online version of Equation 11.29 becomes:

0
for wT x < 0
hP
h
i
whj = (
i(ri yi)vih xj otherwise
Try MLPs with f3; 6; 9; 12; 15; 18g hidden units. Report and plot the training and validation error rates by the number of hidden units. How many hidden units should you use? Report the error rate on the test set using this number of hidden units.

Hint: When choosing the best stepsize (between 0 and 1 such as 10 5

2

), you might need to start with some value and, after a certain number of iterations, decrease your to improve the convergence. Alternatively, you can implement Momentum or Adaptive Learning Rate (section 11.8.1 in the textbook).

Train your MLP with the best number of hidden units obtained. Combine the training set and the validation set as one (training+validation) dataset to run the trained MLP from problem 2(a) with the data. Apply PCA to the values obtained from the hidden units (you can use the Matlab pca() function). Using the projection to the rst 2 principal components, make a plot of the training+validation dataset (similar to Figure 11.18 in the textbook). Use di erent colors for di erent digits and label each sample with its corresponding digits (the same as you did in HW3). Repeat the same projecting the datasets to the rst 3 principal components and do the visualization using 3-D plot. (Hint: you can use the MATLAB function plot3() to visualize the 3-D data). Compare the 2-D and 3-D plots and explain the results in the report.

Note: Change the x-axis and y-axis to log scale in order to better visualize the datapoints.

(30 points) MATLAB provides the Deep Learning Toolbox for designing and implementing deep neural networks. In this homework question you will learn how to create simple convolutional neural networks (CNNs) for optdigits clas-si caion.

Read the MATLAB documentation2 to get familiar with how to

Load and explore image data.

De ne the network architecture.

Specify training/validation options.

Train the network.

Predict the labels of testing data and calculate the classi cation ac-curacy.

Read another MATLAB documentation3 to learn how to de ne your own customized layer.

2https://www.mathworks.com/help/deeplearning/examples/create-simple-deep-learning-network-for-cl html

3https://www.mathworks.com/help/releases/R2018a/nnet/ug/ define-custom-deep-learning-layer.html

3

Run the dataPreparation.m script to convert the three optdigits.txt les into the required input formats. Modify the examplePreluLayer.m (as described in the Completed Layer section of 3) le to de ne a class called myLReLULayer, which creates the leaky ReLU layer as de ned in Question 1.

Modify the De ne Network Architecture section in the main.m le to test the following two CNN structures.

Input layer ! 2D convolution layer (1 lter with size 4) ! Batch normalization layer ! LReLU layer (use your own customized myL-ReLULayer class) ! Fully connected layer ! Softmax layer ! Clas-si cation layer

Input layer ! 2D convolution layer (20 lter with size 3) ! Batch normalization layer ! LReLU layer (use your own customized myL-ReLULayer class) ! Pooling layer (use max pooling with poolsize 3 and stride size 2) ! 2D convolution layer (32 lter with size 3) ! Batch normalization layer ! LReLU layer (use your own customized myLReLULayer class) ! Fully connected layer ! Softmax layer ! Classi cation layer

For both network structures, take a screen shot of the Training Process images generated by MATLAB, and report the accuracies on the testing data.

Instructions

Solutions to all questions must be presented in a report which includes result explanations, and all images and plots.

All programming questions must be written in Matlab, no other program-ming languages will be accepted. The code must be able to be executed from the Matlab command window on the cselabs machines. Each function must take the inputs in the order speci ed and print/display the required output to the Matlab command window. For each part, you can submit additional les/functions (as needed) which will be used by the main functions speci ed below. Put comments in your code so that one can follow the key parts and steps. Please follow the rules strictly. If we cannot run your code, you will receive no credit.

Question 2:

4

{ Train a MLP: mlptrain(train data:txt: path to training data le, val data:txt: path to validation data, m: number of hidden units, k: number of output units). The function must return in variables the outputs (z: a n m matrix of hidden unit values, w: a m (d+1) matrix of input unit weights, and v: a k (m + 1) matrix of hidden unit weights). The function must also print the training and validation error rates for the given function parameters.

{ Test a MLP: mlptest(test data:txt: path to test data le, w: a m (d+ 1) matrix of input unit weights, v: a k (m + 1) matrix of hidden unit weights). The function must return in variables the outputs (z: a n m matrix of hidden unit values), where n is the number of training samples. The function must also print the test set error rate for the given function parameters.

{ mlptrain will implement an MLP with d inputs and one input bias unit, m hidden units and one hidden bias unit, and k outputs.

{ problem2a.m and problem2b.m: scripts to solve the problems 2 (a) and (b), respectively, calling the appropriate functions.

You may nd the following built-in Matlab functions: repmat() and reshape(). For the optdigits data, the rst 64 columns are the data and the last column

is the label.

Submission

Things to submit:

hw4 sol.pdf: A PDF document which contains the report with solutions to all questions.

mlptrain.m: The Matlab code of the mlptrain function.

mlptest.m: The Matlab code of the mlptest function.

problem2a.m: Code to solve problem 2 (a).

problem2b.m: Code to solve problem 2 (b).

myLReLULayer.m: Your own customized leaky ReLU layer in problem 3(b).

main.m: The modi ed script for the neural structure in problem 3(c)(ii).

5

Any other les, except the data, which are necessary for your code.

Submit: hw4 sol.pdf and a zip le of all other les must be submitted elec-tronically via Canvas.

6