$24
[6 pts] HMM: Search and Rescue
Problem 1 (40 points): You are an interplanetary search and rescue expert who has just received an urgent message: a rover You are an interplanetary search and rescue expert who has just received an urgent message: a rover on Mercury on Mercury has fallen and become trapped in Death Ravine, deep, narrow gorge on the borders of enemy territory. You has fallen and become trapped in Death Ravine, a deep, narrow gorge on the borders of enemy territory. You zoom zoom over to Mercury to investigate the situation. Death Ravine is a narrow gorge 6 miles long, as shown below. There are over to Mercury to investigate the situation. Death Ravine is a narrow gorge 6 miles long, as shown below. There
volcanic vents at locations A and D, indicated by the triangular symbols at those locations.
are volcanic vents at locations A and D, indicated by the triangular symbols at those locations.
A B C D E F
! !
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still func-
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still functioning tioning are its thermometers, which register only two levels: hot and cold. The rover sends back evidence E = hot
are its thermometers, which register only two levels: hot and cold. The rover sends back evidence E = hot when it is at a when it is at a volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken reading. The rover
volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken reading. The rover fell into the gorge at
}.
fell into the gorge at position A on day 1, so X1 = A. Let the rover’s position on day t be Xt ∈ {A, B, C, D, E, F
position A on day 1, so X1 = A. Let the rover’s position on day t be Xt
A; B; C; D; E; F . The rover is still executing
The rover is still executing its original programming, trying to
move 1 mile east (i.e. right, towards F) every day.
2 f
g
its original programming, trying to move 1 mile east (i.e. right, towards F) every day. However, be
of the damage, it
However, because of the damage, it only moves east with probability 0.5, and it stays in placausewith probability 0.5.
nly moves east with probability 0:80, and t stays in place with probabilit
0:20. Your job is to figure out where the rover
Your job is to figure out where the rover is, so that you can dispatch your rescue-bot.
is, so that you can dispatch your rescue-bot.
(2 pt) Three days have passed since the rover fell into the ravine. The observations were (E1 = hot, E2 = cold, E3 = cold ). What is P (X3|hot1, cold2, cold3), the probability distribution over the rover’s position on day 3, given the observations?
Filtering: Three days have passed since the rover fell into the ravine. The observations were (E1 = hot; E2 = cold; E3 = cold). What is P (X3 j hot1; cold2; cold3), the probability distribution over the rover’s position on day 3, given the observations? (This is a probability distribution over the six possible positions).
Smoothing: What is P (X2 j hot1; cold2; cold3), the probability distribution over the rover’s position on day 2, given the observations? (This is a probability distribution over the six possible positions).
Prediction: What is P (hot4 j hot1; cold2; cold3), the probability of observing hot4 in day 4 given the previous obser-vations in days 1,2, and 3? (This is a single value, not a distribution).
Prediction: You decide to attempt to rescue the rover on day 4. However, the transmission of E4 seems to have been corrupted, and so it is not observed. What is the rover’s position distribution for day 4 given the same evidence, P (X4 j hot1; cold2; cold3)?
Bonus Question for Extra Credit: What is P (hot4; hot5; cold6 j hot1; cold2; cold3), the probability of observing hot4 and hot5 and cold6 in days 4,5,6 respectively, given the previous observations in days 1,2, and 3? (This is a single value, not a distribution). Note: The answer is a little bit too long, I don’t recommend answering this question unless you have already answered it. The max score for HW3, including all extra credits (LaTeX and this question), is 110.
You need to apply the formulas covered in the lecture to get to the answers. Do not just guess the answer using your logic. There are a lot of calculations involved, but most of them result in zeros. Save your time, if a product of probabilities contains a probability that is 0, then just write 0 and do not write down the entire product.
Problem 2 (10 points): Consider the Markov Decision Process (MDP) with transition probabilities and reward function as given in the tables below. Assume the discount factor = 1 (i.e., there is no actual discounting).
s
a
s0
T (s; a; s0 )
A
1
A
1
A
1
B
0
A
2
A
0:5
A
2
B
0:5
s
a
s0
T (s; a; s0 )
B
1
A
0
B
1
B
1
B
2
A
0
B
2
B
1
s
a
R(s; a)
A
1
0
A
2
1
s
a
R(s; a)
B
1
5
B
2
0
We follow the steps of the Policy Iteration algorithm as explained in the class.
Write down the Bellman equation.
The initial policy is (A) = 1 and (B) = 1. That means that action 1 is taken when in state A, and the same action is taken when in state B as well. Calculate the values V2 (A) and V2 (B) from two iterations of policy evaluation (Bellman equation) after initializing both V0 (A) and V0 (B) to 0.
Find an improved policy new based on the calculated values V2 (A) and V2 (B).