$29
By turning in this assignment, I agree by the Stanford honor code and declare that all of this is my own work.
Overview
In this assignment we will be looking at meta-learing for few shot classi cation. You will
(1) Learn how to process and partition data for meta learning problems, where training is done over a distribution of training tasks p(T ).
(2) Implement and train memory augmented neural networks, which meta-learn through a recurrent network
(3) Analyze the learning performance for di erent size problems
(4) Experiment with model parameters and explore how they improve performance. We will be working with the Omniglot dataset [1], a dataset for one-shot learning which
contains 1623 di erent characters from 50 di erent languages. For each character there are 20 28x28 images. We are interested in training models for K-shot, N-way classi cation, that is, we want to train a classi er to distinguish between N previously unseen characters, given only K labeled examples of each character.
Submission: To submit your homework, submit one pdf report and one zip le to Grade-Scope, where the report will contain answers to the deliverables listed below and the zip le contains your code (hw1.py, load data.py) with the lled in solutions. Code Overview: The code consists of two les
load data.py: Contains code to load batches of images and labels
hw1.py: Contains the network architecture/loss functions and training script.
There is also the omniglot resized folder which contains the data. You should not modify this folder.
Dependencies: We expect code in Python 3.5+ with Pillow, scipy, numpy, tensorflow installed.
Problem 1: Data Processing for Few-Shot Classi cation
Before training any models, you must write code to sample batches for training. Fill in the sample batch function in the DataGenerator class in the load data.py le. The class
1
X1,Y1
X2,Y2
XN*K , YN*K
X1,0
X2,0
XN,0
Network
Network
…Network
…
Network
Network
…Network
∅
∅
∅
Time
#
$
%
Optimize Network
To Minimize
Figure 1: Feed K labeled examples of each of N classes through network with memory. Then feed nal set of N examples and optimize to minimize loss.
already has variables de ned for batch size batch size (B), number of classes num classes (N), and number of samples per class num samples per class (K). Your code should
1. Sample N di erent classes from either the speci ed train, test, or validation folders.
2. Load K images per class and collect the associated labels
3. Format the data and return two numpy matrices, one of attened images with shape [B; K; N; 784] and one of one-hot labels [B; K; N; N]
Helper functions are provided to (1) take a list of folders and provide paths to image les/labels, and (2) to take an image le path and return a attened numpy matrix.
Problem 2: Memory Augmented Neural Networks [2, 3]
We will be attempting few shot classi cation using memory augmented neural networks. The idea of memory augmented networks is to use a classi er with recurrent memory, such that information from the K examples of unseen classes informs classi cation through the hidden state of the network.
The data processing will be done as in SNAIL [3]. Speci cally, during training, you sample batches of N classes, with K +1 samples per batch. Each set of labels and images are concatenated together, and then all K of these concatenated pairs are sequentially passed through the network. Then the nal example of each class is fed through the network (concatenated with 0 instead of the true label). The loss is computed between these nal outputs and the ground truth label, which is then backpropagated through the network. Note: The loss is only computed on the last set of N classes.
The idea is that the network will learn how to encode the rst K examples of each class into memory such that it can be used to enable accurate classi cation on the K + 1th example. See Figure 1.
In the hw1.py le:
2
1. Fill in the call function of the MANN class to take in image tensor of shape [B; K + 1; N; 784] and a label tensor of shape [B; K + 1; N; N] and output labels of shape [B; K + 1; N; N]. The layers to use have already been de ned for you in the init function. Hint: Remember to pass zeros, not the ground truth labels for the nal N examples.
2. Fill in the function called loss function which takes as input the [B; K + 1; N; N] labels and [B; K + 1; N; N] and computes the cross entropy loss.
Note: Both of the above functions will need to backpropogated through, so they need to be written in di erentiable tensor ow.
Problem 3: Analysis
Once you have completed problems 1 and 2, you can train your few shot classi cation model.
For example run python hw1.py --num classes=2 --num samples=1 --meta batch size=4 to run 1-shot, 2-way classi cation with a batch size of 4. You should observe both the train and testing loss go down, and the test accuracy go up.
Now we will examine how the performance varies for di erent size problems.
Train models for the following values of K and N.
K=1,N=2
K=1,N=3
K=1,N=4
K=5,N=4
For each con guration, submit a plot of the test accuracy over iterations. Note your observations.
Problem 4: Experimentation
• Experiment with one parameter of the model that a ects the performance of the model, such as the type of recurrent layer, size of hidden state, learning rate, number of layers. Show learning curves of how the test success rate of the model changes on 1-shot, 3-way classi cation as you change the parameter. Provide a brief rationale for why you chose the parameter and what you observed in the caption for the graph.
• Extra Credit: You can now change the MANN architecture however you want (in-cluding adding convolutions). Can you achieve over 60% test accuracy on 1-shot, 5-way classi cation?
3
References
[1] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level con-cept learning through probabilistic program induction. Science, 350(6266):1332{1338, 2015.
[2] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lill-icrap. Meta-learning with memory-augmented neural networks. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1842{1850, New York, New York, USA, 20{22 Jun 2016. PMLR.
[3] Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. Meta-learning with temporal convolutions. CoRR, abs/1707.03141, 2017.
4