Starting from:
$24.99

$18.99

Machine Problem #12 Solution

Note: The assignment will be autograded.  It is important that  you do not use additional libraries,  or change the  provided  functions  input  and  output.  Start  early!  Training  takes about  3 days on CPU.

 

Part 1:  Setup

 

• Remove connect to a EWS machine.

 

ssh (netid)@remlnx.ews.illinois.edu

 

 

• Load python  module, this will also load pip and virtualenv

 

module load python/3.4.3

 

 

• Reuse the virtual  environment from mp0.

 

source ~/cs446sp_2018/bin/activate

 

 

• Copy mp12 into your svn directory,  and change directory  to mp12.

 

cd ~/(netid)

svn cp https://subversion.ews.illinois.edu/svn/sp18-cs446/_shared/mp12 . cd mp12

 

 

• Install  the requirements  through  pip.

 

pip install -r requirements.txt

 

 

•  Prevent svn from checking in the checkpoint directory.

 

svn propset svn:ignore saved_networks_q_learning .

 

 

 

Part 2:  Exercise

 

In this part,  you will train  an AI agent to play the Pong game using Q-learning.  On the low level, the  game works as follows: we receive the  last  4 image frames which constitute the state  of the game and we get to decide if we want to move the paddle to the left, to the right or not to move it (3 possible actions).  After every single choice, the game simulator  executes the action and gives us a reward:  either  a +1  reward if the ball went past  the opponent,  a

 

 

 

1

 

 

 

Figure 1: Q-Network

 

 

 

-1 reward if we missed the ball and 0 otherwise.  Our goal is to move the paddle so that  we get lots of reward.

 

Q-learning

This  exercise requires  you to use Q-Learning  to train  a convolutional  neural  networks for playing the  Pong  game.  We consider the  deep Q-network  in the  figure below.  Follow the instructions  in the starter code to fill in the missing functions.  Include a learning curve plot showing the performance of your implementation. The x-axis should correspond to the number  of episodes and the y-axis should show the reward after every episode.  Your agent should be performing well after 4-5m steps.

 

Function Description

 

• get action  index : During the observation  phase, this function returns  a randomly cho- sen action.    Beyond  the  observation  phase,  the  action  with  the  highest  Q-value  is returned  with probability  (1 − epsilon)  and a random  action is chosen with probabil- ity  epsilon.   epsilon  is a temperature parameter that  controls  the  trade-off between exploration  and exploitation.

 

• scale down epsilon :  During  the  observation  phase,  epsilon  is set  to  1.   Beyond  the



EX P LORE
 
observation  phase,  epsilon  is scaled down with  (I N I T I AL  EP SI LON −F I N AL  EP SI LON )   as

long as epsilon  is larger than  F I N AL EP SI LON .

 

• run  selected action : Feed the selected action into the game simulator to obtain the next frame, the reward and a boolean T erminal indicating  whether  the game terminated. This  function  returns  the  next  state,  reward  and  the  boolean T erminal.  The  next state  is produced  by concatenating the  four most  recent  frames,  in order  to capture the motion of the ball and paddle.

 compute target  q :  This  function  computes  the  target  Q-value  for all samples in the batch.  Distinguish  two cases depending on whether the next state  is a terminal  state.

 

 

Relevant  Files:  q learning.py

 

Part 3:  Writing Tests

In test.py  we have provided basic test-cases.  Feel free to write more.  To test  the code, run:

 

nose2

 

 

Part 4:  Submit

Submitting the  code is equivalent  to  committing  the  code.   This  can  be  done  with  the

following command:

 

svn commit -m "Some meaningful comment here."

 

 

Lastly, double check on your browser that  you can see your code at

 

https://subversion.ews.illinois.edu/svn/sp18-cs446/(netid)/mp12/

 

 

 

Figure 2: Pong Game

More products