In this exercise you will implement the value iteration algorithm for Markov Decision Processes.
The value iteration algorithm is applied to a 2D grid decision process, where different locations on a can contain different rewards. The purpose is to compute the value of each location, and the
corresponding policy.
1. To prepare for the exercise, make sure you have consulted the lecture slides
and MyCourses material related to Markov Processes, The Bellman Equation, and
Value Iteration.
2. Copy `template-valueiteration.py` to `valueiteration.py`
3. Read and understand all code
- mdp.py :: This file defines an abstract class providing a general interface
for Markov Decision Processes. No need to edit.
- valueiteration.py :: Declares function related to value iteration
TASKs 2.x are found here.
- gridmdp.py :: This file defines a grid Markov Decision Process by
inheriting from mdp.py. No need to edit.
- gridactions.py :: Defines actions used by gridmdp.py. No need to edit.
- utils.py :: Defines some utility function, notably `argmax` which may come in handy.
4. Implement TASK 2.1, and 2.2
- TASK 2.1 :: Implement the `value_of` function.
- TASK 2.2 :: Implement the `value_iteration` function (using `value_of`)
- `python valueiteration.py` :: Will execute a few basic examples on grids.
- `python test_valueiteration.py` :: Will execute a few unit tests.
Good luck!