Starting from:


Artificial Intelligence Lab #11: Q-learning Solution

Q1) Implement the following iteration :

xt+1 = xt + αt (yt − xt ) (1)

, where xt ∈ R, yt is a random variable, and αt 0 is a step-size. Let us understand how this works by changing the step-size and the random variable:

25 Marks Keep αt = 0.1, 0.01, 0.001 and then

1. yt is a uniform in [−1, 1]. Plot xt .

2. yt is a uniform in [0, 1]. Plot xt .


25 Marks Keep αt = 1/(t + 1), αt = c for some c, c0 0, and then

1. yt is a uniform in [−1, 1]. Plot xt .

2. yt is a uniform in [0, 1]. Plot xt .

For all the above cases, plot xt .

Q2) Implement value iteration for grid world with Q values. Same as previous lab second question, however use the 2-D array namely Q-values. [30 Marks]

Q3) Implement Q-learning for grid world. [20 Marks]

More products