$24
Q1) Implement the following iteration :
xt+1 = xt + αt (yt − xt ) (1)
, where xt ∈ R, yt is a random variable, and αt 0 is a step-size. Let us understand how this works by changing the step-size and the random variable:
25 Marks Keep αt = 0.1, 0.01, 0.001 and then
1. yt is a uniform in [−1, 1]. Plot xt .
2. yt is a uniform in [0, 1]. Plot xt .
t+c0
25 Marks Keep αt = 1/(t + 1), αt = c for some c, c0 0, and then
1. yt is a uniform in [−1, 1]. Plot xt .
2. yt is a uniform in [0, 1]. Plot xt .
For all the above cases, plot xt .
Q2) Implement value iteration for grid world with Q values. Same as previous lab second question, however use the 2-D array namely Q-values. [30 Marks]
Q3) Implement Q-learning for grid world. [20 Marks]