Implement value iteration

Starting from:

~~$30~~

$24

Home

In this exercise you will implement the value iteration algorithm for Markov Decision Processes.

The value iteration algorithm is applied to a 2D grid decision process, where different locations on a can contain different rewards. The purpose is to compute the value of each location, and the

corresponding policy.

Instructions

^^^^^^^^^^^^

1. To prepare for the exercise, make sure you have consulted the lecture slides

and MyCourses material related to Markov Processes, The Bellman Equation, and

Value Iteration.

2. Copy `template-valueiteration.py` to `valueiteration.py`

3. Read and understand all code

- mdp.py :: This file defines an abstract class providing a general interface

for Markov Decision Processes. No need to edit.

- valueiteration.py :: Declares function related to value iteration

TASKs 2.x are found here.

- gridmdp.py :: This file defines a grid Markov Decision Process by

inheriting from mdp.py. No need to edit.

- gridactions.py :: Defines actions used by gridmdp.py. No need to edit.

- utils.py :: Defines some utility function, notably `argmax` which may come in handy.

4. Implement TASK 2.1, and 2.2

Tasks

^^^^^

- TASK 2.1 :: Implement the `value_of` function.

- TASK 2.2 :: Implement the `value_iteration` function (using `value_of`)

Testing

^^^^^^^

- `python valueiteration.py` :: Will execute a few basic examples on grids.

- `python test_valueiteration.py` :: Will execute a few unit tests.

Good luck!