$30
Instructions: Submit two files only: writeup.pdf and code.zip. Only question 2 are a programming questi-ons; written answers are expected for all other questions. For all questions, show all your work, including intermediate steps.
1 Reinforcement Learning [40 pts]
(a) Q-table [20 pts]
Imagine that you bought a robot to entertain your child. Your child has three different emotional states—she is either sad, bored or happy. The robot has two actions—it can either talk or dance. The way that your child interact with the robot is depicted by the following state-transition diagram. On each edge, the action that causes the transition and the reward associated the action is marked.
Suppose that your child starts in the BORED state, and that your robot is actively learning its Q-table by selecting actions randomly. It turns out that the first 4 actions were dance, talk, talk, dance. Assume also that the initial values of the Q-table are 0, the discount factor is = 0:5 and = 1. Show the updated Q-table after each of the 4 actions.
(b) Problem Formulation [20 pts]
You have $20 and will play until you lose all the money or as soon as you double the money (to $40). You can choose to play two slot machines: 1) slot machine A costs $10 to play, and will return $20 with probability 0.5 and $0 otherwise; and 2) slot machine B costs $20 to play and will return $30 with probability 0.01 and $0 otherwise. Until you are done, you will choose to play machine A or machine B in each turn. Describe the MDP that captures the above description. Describe the state space, action space, rewards, transition probabilities. Assume that the discount factor = 1.
2 Recurrent Neural Network [60 pts]
In this question, you will experiment with various types of recurrent neural networks (RNNs) in PyTorch. Py-Torch is a popular package for dynamic neural networks that can easily handle sequence data of varying length. For GPU acceleration, it is recommended that you perform your experiments in Google’s Colaborary environment. This is a free cloud service where you can run Python code (including PyTorch) with GPU acceleration. A virtual machine with two CPUs and one Nvidia K80 GPU will run up to 12 hours after which it must be restarted. The following steps are recommended:
Create a Python notebook in Google Colab: https://colab.research.google.com
Click on edit, then notebook settings and select None (CPU) or GPU for hardware acceleration.
Install PyTorch by following the instructions on the following page: https://colab.research.google. com/notebooks/snippets/importing_libraries.ipynb Note that you will have to reinstall PyTorch each time that you obtain a virtual machine in Colab. Hence it is recommended to store the code for the installation of PyTorch in a cell that can be executed easily each time you obtain a virtual machine.
Get familiar with PyTorch (http://pytorch.org/) by going through the tutorial “Get familiar with Py-Torch: a 60 minute blitz”: http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz. html
1
Answer the following questions:
Encoder Implementation [20 pts]
Go through the tutorial “Classifying Names with a Character-Level RNN”: http://pytorch.org/ tutorials/intermediate/char_rnn_classification_tutorial.html
Download the data associated with the tutorial. In your Python notebook within Google Colab, use the following instructions to download the data into the working directory of the virtual machine: !wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip
Run the script at the end of the tutorial
Compare the accuracy of the encoder when varying the type of hidden units: linear units, gated recurrent units (GRUs) and long short term memory (LSTM) units. For linear hidden units, just run the script of the tutorial as it is. For GRUs and LSTMs, modify the code of the tutorial. Hand in the following material:
Electronic copy of your code
Graph that contains 3 curves (linear hidden units, GRUs and LSTM units). The y-axis is the test (validation) negative log likelihood and the x-axis is the number of thousands of iterations.
Explanation of the results (i.e., why some hidden units perform better or worse than other units).
Decoder Implementation [20 pts]
Go through the tutorial “Generating names with character-level RNN”: http://pytorch.org/ tutorials/intermediate/char_rnn_generation_tutorial.html
Download the data associated with the tutorial. In your Python notebook within Google Colab, use the following instructions to download the data into the working directory of the virtual machine:
!wget https://download.pytorch.org/tutorial/data.zip !unzip data.zip
Run the script at the end of the tutorial
Compare the accuracy of the decoder when varying the information fed as input to the hidden units at each time step: i) previous hidden unit, previous character and category; ii) previous hidden unit and previous character; iii) previous hidden unit and category; iv) previous hidden unit. For i), just run the script of the tutorial as it is. For ii) and iv) modify the code to feed the category only as input to the first hidden unit. For iii) and iv), modify the code to avoid feeding the previous character as input to each hidden unit. Hand in the following material:
Electronic copy of your code
Graph that contains 4 curves (i, ii, iii, iv). The y-axis is the test (validation) negative log likelihood and the x-axis is the number of thousands of iterations.
Explanation of the results (i.e., how does the type of information fed to the hidden units affect the results).
Seq2Seq Implementation [20 pts]
Go through the tutorial “Translation with a sequence to sequence model with attention”: http: //pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
Download the data associated with the tutorial. In your Python notebook within Google Colab, use the following instructions to download the data into the working directory of the virtual machine: !wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip
Run the script at the end of the tutorial
Compare the accuracy of the seq2seq model with and without attention. For the seq2seq model with attention, just run the script of the tutorial as it is. For the seq2seq model without attention, modify the code of the tutorial. Hand in the following material:
Electronic copy of your code
Graph that contains 2 curves (with attention and without attention). The y-axis is the test (validati-on) negative log likelihood and the x-axis is the number of thousands of iterations.
Explanation of the results (i.e., how does attention affects the results).
2