Starting from:
$35

$29

Reinforcement Learning Assignment 3 Solution

    • Introduction

The goal of this assignment is to do experiment with model-free control, includ-ing on-policy learning (Sarsa) and o -policy learning (Q-learning). For deep understanding of the principles of these two iterative approaches and the di er-ences between them, you will implement Sarsa and Q-learning at the application of the Cli Walking Example, respectively.

    • Cli  Walking


















Figure 1: Cli    Walking

Consider the gridworld shown in the Figure 1. This is a standard undis-counted, episodic task, with start state (S), goal state (G), and the usual actions causing movement up, down, right, and left. Reward is -1 on all transitions ex-cept those into the region marked \The Cli ". Stepping into this region incurs a reward of -100 and sends the agent instantly back to the start.

    • Experiment Requirments

Programming language: python3

You should build the Cli Walking environment and search the optimal travel path by Sara and Q-learning, respectively.

Di erent settings for can bring di erent exploration on policy update. Try several (e.g. = 0:1 and = 0) to investigate their impacts on performances.






2




    • Report and Submission

Your reports and source code should be compressed and named after "stu-dentID+name".

The  les should be submitted on Canvas before Apr. 24, 2020.


















































3

More products