$29
In this project we are going to implement a neural network to recognize hand written digits. As in the previous project you are expected to write a nal report explaining how you organized your research group, the algorithms you used, the code you wrote, mathematical derivations (if needed), results and conclusions from your study.
Remember to describe how you are splitting the task with your team members and provide evidence you are using Github.
Data Set
i.- MINST data base Read pages 179-180 in Greenbaum and Chartier. A neural network needs a training data set that you will use to tune up the parameters of the network and a test set. Details of the data base are given in Greenbaum and Chartier and the data set can be downloaded from the book’s webpage.
ii.- Plot digits. Implement a program that reads digits from the data base (see Fig. 4). Plot a couple of examples of the data base to convince yourself that the program is working. Compute the average digit as explained in problem 17a, plot them and compare them with those in page 180. These three examples should help you make sure that your program is working properly.
iii.- A neuron. The implementation of a neuron is shown in Fig. 1. Each neuron consists of a set of weighted connections, and an internal activation function. Assume that a neuron has n input connections (from the data input or from other neurons). We will call the inputs fO1; :::; Ong and the input weights fw1; :::; wng . The value NET is calculated as explained in the gure and is the input of an activation function F (explained in Figure 2). Your next task is to implement a neuron where F is given on top o Fig. 1 and in Fig. 2.
We will consider the activation function to be the sigmoidal (logistic) function shown in Fig. 2. Notice that its derivative has a nice expression in terms of the OUT value. (Verify that this expression is correct and include it in your report). Analyze this logistic function. What is going to be the output of the activation function for small vs large value of NET ? What other functions can you use and what would be the e ect between input and output?
iv.- Multilayer Network Figure 3 shows the structure of a network with one input layer, one output layer and one hidden layer. The last layer in the gure (labeled as TARGET) is not part of the network but instead contains the values of the training set against which we are comparing the output. The INPUT LAYER does not perform calculations, it only takes the values from the data (See Figure 4). Neurons in the HIDDEN and the OUTPUT layer contain NET and OUT (as indicated above). The values wi;j is the weight connecting neuron i in layer 1 with neuron j in layer 2. The
Figure 1: Structure of a neuron in the neural network. Given a set on inputs (to the neuron) and weights. The neuron rst computes the value NET. The nal output of the neuron will be given by the image of NET by the activation function F
Figure 2: The activation function. The activation function can be any (bounded) di eren-tiable function
output layer will produce OUTPUT that will be compared against the TARGET value. (Remember that we will be training the network, that means we know the input and output values). For instance, if we input the pixels for number ve we will expect the network to output the number ve. This will be compared with the target number ve. The error can be encoded as 0 or 1 based on whether the network got it right or wrong. The network in Fig. 3 has only two layers (one hidden and one output) but a useful network will need more layers and more networks per layer. Implement a network with a variable number of hidden networks (and neurons per layer) in which each neuron has the structure explained above.
v.- Initializing the network Initialize the network by assigning a random (small) number to each weight wi;j.
vi.- Training the network A network will learn by iteratively adapting the values of wi;j. Each input is associated to an output. These are called training pairs. Follow these steps (i.e. general structure of your algorithm)
{ Select the next training pair (INPUT, OUTPUT) and apply the INPUT to the network
{ Calculate the output using the network
Figure 3: Multilayer network. Networks will have one INPUT, one OUTPUT layer and several HIDDEN layers. The number of layers and neurons per layer should be adjusted to the speci c problem
{ Calculate the error between the network’s output and the desired output { Adjust weights in a way that minimizes the error
{ Repeat steps above for each training pair
Calculations are performed by layers, that is all calculations are performed in the hidden layer before any calculation is performed in the output layer. The same applies when several hidden layers are present. The rst two steps above are called forward pass and the second two the reverse pass.
Figure 4: Structure of the INPUT layer for digit recognition
Forward pass: Notice that the weights in between layers of neurons can be represented by the matrix W and that if X is the input vector, then N ET = XW and the output vector is O = F (N ET ). The output vector will be the input vector for the next layer.
Reverse pass. Adjusting the weights of the output layer: The ERROR signal is produced by comparing the OUTPUT with the TARGET value. We will consider the training process for a single weight from neuron p in the hidden layer j to neuron q in the output layer k (See Fig. 5). First we calculate the ERROR (=j T ARGET OU T j) for that output neuron k. This is multiplied by the derivative of the activation function for neuron k .
= OU Tq;k (1 OU Tq;k)(ERROR)
This value is further multiplied by the OU Tp;j of the neuron p in hidden layer j and a training rate coe cient 2 [0:01; 0:1]. The result is added to the weight connecting the two neurons.
wpq;k = q;k OU Tp;j;
Therefore the con guration of the output layer will change for the next training set as follows
wpq;k(n + 1) = wpq;k(n) + wpq;k
Figure 5: The training of each weight combines OUT values from the output and input neurons. The vertical line indicates the calculations that are needed to modify each weight
Reverse pass. Adjusting the weights of the hidden layers: Backpropagation will train the hidden layers by \propagating" the error back and adjusting the weights on its way. The algorithm uses the same two equations as above for wpq;k and wpq;k(n + 1) however the value of needs to be computed di erently.
The process is shown in Figure 6. Once has been calculated for the output layer it is used to compute the value of for each neuron.
X
p;j = OU Tp;j(1 OU Tp;j)( q;kwpq;k)
k
It is important to realize that all the weights associated with each layer must be adjusted moving back from the output to the rst layer.
Figure 6: The training of each weight requires a new calculation of
In matrix notation this is written as follows: If Dk is the set of the deltas at the output layer, Wk the set of weights at the output layer and Dj the vector of deltas for the hidden layer in the previous Fig. 3
Dj = DkWkt [Oj (I Oj)]
where is de ned to indicate the component-by-component multiplication of the two vectors. Oj is the output vector of layer j and I is the vector with all components equal to 1. Show that this last formula is correct.
vii.- Dependence on parameters The learning of the network (i.e. the minimization of the error) will depend on the number of layers, the number of neurons per layer, and for xed values of these two parameters the network will also depend on the size of the training set. Set up a study in which you change the values of these parameters and report the error(s) you obtain (you will obtain an error for the training set and another for the test test{which should be very similar to each other provided the test and training set are similar enough).
References
{ P.D. Wasserman. Neural Computing: Theory and Practice. Van Nostrand Rein-hold