$24
You may complete this homework assignment either individually or in teams up to 2 people.
1. Softmax regression (aka multinomial logistic regression) [30 points]: In this problem you will train a softmax regressor to classify images of hand-written digits from the MNIST dataset. The input to the machine will be a 28 × 28-pixel image (converted into a 784-dimensional vector); the output will be a vector of 10 probabilities (one for each digit). Specifically, the machine you create should implement a function g : R785 → R10 , where the kth component of g(x˜) (i.e., the probability that input x˜ belongs to class k) is given by
where x˜ = [x , 1] .
exp(x˜ w˜ k )
P10
k0 =1 exp(x˜ w˜ k0 )
The weights should be trained to minimize the cross-entropy (CE) loss:
1 n 10
X X
(j)
(j)
J (w˜ 1 , . . . , w˜ 10 ) = − n
j=1 k=1
yk log yˆk
where n is the number of training examples. Note that each yˆk implicitly depends on all the weights
k
w˜ 1 , . . . , w˜ 10 , where each w˜ k = [w , 1] .
To get started, first download the MNIST dataset (including both the training and testing subsets)
from the following web links:
• https://s3.amazonaws.com/jrwprojects/small_mnist_train_images.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_train_labels.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_test_images.npy
• https://s3.amazonaws.com/jrwprojects/small_mnist_test_labels.npy
These files can be loaded into numpy using np.load.
Then implement stochastic gradient descent (SGD) as described in the lecture notes. I recommend setting n˜ = 100 for this project.
Note that, since there are 785 inputs (including the constant 1 term) and 10 outputs, there will be 10 separate weight vectors, each with 785 components. Alternatively, you can conceptualize the weights as a 10 × 785 matrix.
Finally, after optimizing the weights on the training set, compute both (1) the loss and (2) the accuracy (percent correctly classified images) on the test set. Include both the cross-entropy loss values and the “percent-correct” accuracy in the screenshot that you submit.
2. Data augmentation [30 points]: To improve generalization accuracy, it is often useful to enlarge your training set by synthesizing new examples from ones you already have. The simplest way to do this is to apply label-preserving transformations, i.e., create new “copies” of some original training examples by altering them in subtle ways such that the label of the copy is always the same as the original. For images, this can be achieved through operations such as rotation, scaling, translating, as well as adding random noise to the value of each image pixel (e.g., from a Gaussian or Laplacian distribution). (For symmetric classes (e.g., 8, 0), you could use mirroring/flipping, though this is not required for this assignment.)
You are required to implement all of the following transformations: translation, rotation, scaling, random noise. (For rotation, feel free to use the skimage.transform.rotate method in the skimage package.) Show an example (in a PDF file) of an original and augmented training example for each of
these transformations. Then, use your data augmentation methods to triple the size of your training data, and show that this helps your softmax regression machine to increase its test accuracy compared to training without augmentation.
In addition to submitting your Python code in a file called homework3 WPIUSERNAME1.py
(or homework3 WPIUSERNAME1 WPIUSERNAME2.py for teams), please submit a PDF file containing a screenshot
of (1) the last 20 iterations of your gradient descent on the training data Name the file homework3 WPIUSERNAME1.pdf
(or homework3 WPIUSERNAME1 WPIUSERNAME2.pdf for teams).