$24
Question 1. [30 points]
In this question you will implement an autoencoder neural network with a single hidden layer for unsupervised feature extraction from natural images. The following cost function will be minimized:
Jae = 2N
N
kd(m) o(m)k2 + 2
2
Lhid Lin
(Wa;b ) +
Lout Lhid
(Wb;c )
3
+
KL( j^b)
1
Lhid
X
X X
(1)
2
X X
(2)
2
Xb
4
5
i=1
b=1 a=1
c=1 b=1
=1
(1)
The rst term is the average squared-error between the desired response and the network
output across training samples. Note that the desired output is the same as the input. The
second term enforces Tykhonov regularization on the connection weights with parameter
. The last term enforces that the hidden unit activations are sparse with parameter for
controlling the relative weighting of this term. The level of sparsity is tuned via in the KL
term (Kullback-Leibler divergence) between a Bernouilli variable with mean and another
with mean ^b. ^b is the average activation of hidden unit b across training samples.
a) The le assign3_data1.h5 contains a collection of 16 16 RGB patches extracted from various natural images in data. Preprocess the data by rst converting the images to graysale using a luminosity model: Y = 0:2126 R + 0:7152 G + 0:0722 B. To normalize the data, rst remove the mean pixel intensity of each image from itself, and then clip the data range at 3 standard deviations (measured across all pixels in the data). To prevent saturation of the activation function, map the 3 std. data range to [0:1 0:9]. Display 200 random sample patches in RGB format, and separately display the normalized versions of the same patches. Comment on your results.
b) Prior to training, initialize the weights and the bias terms as uniform random numbers
from the interval [ wo; wo], where wo = sqrt( 6 ) and Lpre;post are the number of
neurons on either side of the connection weights. Write a cost function for the network [J; Jgrad] = aeCost(We; data; params) that calculates the cost and its partial derivatives. We = [W1 W2 b1 b2], a vector containing the weights for the rst and second layers followed by the bias terms; data is of size Lin N; params is a structure with the following elds Lin (Lin), Lhid (Lhid), lambda ( ), beta ( ), rho ( ). Use J and Jgrad as inputs to a gradient-descent solver to minimize the cost. Assuming Lhid = 64, = 5 10 4, experiment with
, to nd parameters that work well. Note that performance here is de ned based on the ‘quality’ of the features extracted by the network.
c) The solver will return the trained network parameters. Display the rst layer of con-nection weights as a separate image for each neuron in the hidden layer. What do the hidden-layer features look like? Are these features representative of natural images?
d) Retrain the network for 3 di erent values (low, medium, high) of Lhid 2 [10 100], of 2 [0 10 3], while keeping ; xed. Display the hidden-layer features as separate im-ages. Comparatively discuss the results you obtained for di erent combinations of training parameters.
Question 2. [30 points]
The goal of this question is to introduce you CNN models. You will be experimenting with two demos, one on a CNN model in Python, and a second on a CNN model in one of two popular frameworks (PyTorch or TensorFlow). Download demo_cnn.zip from Moodle and unzip it. The demos are given as Jupyter Notebooks along with relevant code and data. The easiest way to install Jupyter with all Python and related dependencies is to install Anaconda. After that you should be able to run through demos in your browser easily. The point of these demos is that they take you through the training algorithms step by step, and you need to inspect the relevant snippets of code for each step to learn about implementation details.
a) The notebook Convolutional_Networks.ipynb contains demonstrations on a CNN model. You need to run the demo till the end without any errors. You are supposed to convert the outputs of the completed demo to a PDF le, and attach it to the project report. You should also comment on your results.
b) The notebooks PyTorch.ipynb and TensorFlow.ipynb contain demonstrations on a CNN model in deep learning frameworks. Please pick a single framework to work with (PyTorch has a Python like feeling but might have limited visualization options, and Ten-sorFlow might have a steeper learning curve but is better equipped with supporting tools). You need to run the selected demo till the end without any errors. You are supposed to convert the outputs of the completed demo to a PDF le, and attach it to the project report. You should also comment on your results.
Question 3. [40 points]
In this question we will consider classifying human activity (downstairs=1, jogging=2, sit-ting=3, standing=4, upstairs=5, walking=6) from movement signals measured with three sensors simultaneously. The le assign3_data3.h5 contains time series of training and testing data (trX and tstX), and their corresponding labels (trY and tstY). The length of each time series is 150 units. The training set consists of 3000 samples, and the test set consists of 600 samples. You are going to implement fundamental recurrent neural network architectures, trained with back propagation through time to solve a multi-class time series classi cation problem.
a) Using the back propagation through time algorithm, implement a single layer recurrent neural network with 128 neurons and hyperbolic tangent activation function, followed by a multi-layer perceptron with a softmax function for classi cation. Use: a stochastic gradient descent algorithm, mini-batch size of 32 samples, learning rate of = 0.1, momentum rate of = 0.85, maximum of 50 epochs, and weights/biases initialized with Xavier Uniform dis-tribution. Adjust the parameters, and number of hidden layers of the classi cation neural network to improve network performance. The algorithm should be stopped based on the categorical cross-entropy error on a validation data (10% samples selected from the training data). Report the following: Validation error as a function of epoch number, accuracy mea-sured over the test dataset, confusion matrix for the training and test set, and discussion of your results.
b) For the time-series data, it is vital to summarize the past observations in the hidden state and to control this information. For this reason, we consider a better alternative which is a long-short term memory or LSTM neural network. Repeat part a for LSTM. Report the following: Validation error as a function of epoch number, accuracy measured over the test set, confusion matrix for the training and test set, discussion of your results, and comparison with the performance in part a?
c) Finally, we consider an alternative to LSTM neural networks, called gated recurrent units (GRU in short). Repeat part a for GRU. Report the following: Validation error as a function of epoch number, accuracy measured over the test set, confusion matrix for the training and test set, discussion of your results, and comparison with the performance in parts a and b?