$24
A*. It first finds the shortest path from the current start state to the goal state according to the current action costs. It then updates the h-values of the states that were expanded by this search to make them larger and thus future A* searches more focused. Adaptive A* searches from the current state of the agent to the target since the h-values estimate the goal distances with respect to a given goal state. Thus, the goal state needs to remain unchanged, and the state of the target remains unchanged while the current state of the agent changes. Adaptive A* can handle action costs that increase over time.
To understand the principle behind Adaptive A*, assume that the action costs remain unchanged to make the description simple. Assume that the h-values are consistent. Let g(s) and f(s) denote the g-values and f-values, respectively, after an A* search from the current state of the agent to the target. Let s denote any state expanded by the A* search. Then, g(s) is the distance from the start state to state s since state s was expanded by the A* search. Similarly, g(sgoal) is the distance from the start state to the goal state. Thus, it holds that g(sgoal ) = gd(sstart ), where gd(s) is the goal distance of state s. Distances satisfy the triangle inequality:
gd(sstart )
g(s) + gd(s)
gd(sstart )
g(s)
gd(s)
g(sgoal )
g(s)
gd(s):
Thus, g(sgoal ) g(s) is an admissible estimate of the goal distance of state s that can be calculated quickly. It can thus be used as a new admissible h-value of state s. Adaptive A* therefore updates the h-values by assigning:
h(s) := g(sgoal ) g(s)
for all states s expanded by the A* search. Let hnew(s) denote the h-values after the updates.
The h-values hnew (s) have several advantages. They are not only admissible but also consistent. The next A* search with the h-values hnew (s) thus continues to find shortest paths without expanding states that have already been expanded by the current A* search. Furthermore, it holds that:
f(s) gd(sstart )
g(s) + h(s) g(sgoal )
h(s) g(sgoal ) g(s)
h(s) hnew (s)
since state s was expanded by the current A* search. Thus, the h-values hnew (s) of all expanded states s are no smaller than the immediately preceeding h-values h(s) and thus, by induction, also all previous h-values, including the user-supplied h-values. An A* search with consistent h-values h1(s) expands no more states than an otherwise identical A* search with consistent h-values h2(s) for the same search problem (except possibly for some states whose f-values are identical to the f-value of the goal state, a fact that we will ignore in the following) if h1(s) h2(s) for all states s. Consequently, the next A* search with the h-values hnew (s) cannot expand more states than with any of the previous h-values, including the user-supplied h-values. It therefore cannot be slower (except possibly for the small amount of runtime needed by the bookkeeping and h-value update operations), but will often expand fewer states and thus be faster. You can read up on Adaptive A* in Koenig and Likhachev, Adaptive A* [Poster Abstract], Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1311-1312, 2005.
Figure 7 shows the first two searches of Adaptive A* for the example search problem from Figure 2 (left). The number of cell expansions is smaller for Adaptive A* (20) than for Repeated Forward A* (23), demonstrating the advantage of Adaptive A* over Repeated Forward A*. Figure 7 (left) shows the first A* search of Adaptive A*. All cells have their h-values in the lower left corner. The goal distance of the current cell of the agent is eight. The lower right corners show the updated h-values after the h-values of all grey cells have been updated to eight minus their g-values, which makes it easy to compare them to the h-values before the h-value update in the lower left corners. Cells D2, E1, E2 and E3 have larger h-values than their Manhattan distances to the target, that is, the user-supplied h-values. Figure 7 (right) shows the second A* search of Adaptive A*, where Adaptive A* expands three cells (namely, cells E1, E2 and E3) fewer than Repeated Forward A*, as shown in Figure 5 (right).
Questions
Part 0 - Setup your Environments [10 points]: You will perform all your experiments in the same 50 gridworlds of size 101 101. You first need to generate these environments appropriate. To do so, generate a maze/corridor-like structure
Time Step 1
1 2 3 4 5
A
8
7
6
5
4
A
B
4
10
5
10
6
10
7
10
B
7
6
5
4
3
C
4
10
3
8
4
8
5
8
6
8
C
6
5
5
4
4
3
3
2
2
D
3
8
2
6
3
3
6
8
7
8
D
5
5
4
6
3
3
2
1
1
E
2
6
1
4
0
2
1
1
8
8
E
4
7
2 A8
1
1
0 T
6
3
Time Step 4
1
2
3
4
5
2
9
3
9
4
9
5
9
8
7
6
5
4
2
9
1
7
2
7
3
7
4
7
7
6
6
5
5
4
4
3
3
1
7
0
5
4
8
4
7
5
7
6
6
5A7
4
4
3
2
2
2
7
1
7
3
3
7
9
6
7
g f
5
6
3
3
2
1
1
5
6
3
9
2
9
1
1
7
7
h hnew
6
7
8
1
1
0 T
Figure 7: Adaptive A*
with a depth-first search approach by using random tie breaking.
One potential way to build the environment is the following. You can initially set all of the cells as unvisited. Then you can start from a random cell, mark it as visited and unblocked. Select a random neighbouring cell to visit that has not yet been visited. With 30% probability mark it as blocked. With 70% mark it as unblocked and in this case add it to the stack. A cell that has no unvisited neighbours is a dead-end. When at a dead-end, your algorithm must backtrack to parent nodes on the search tree until it reaches a cell with an unvisited neighbour, continuing the path generation by visiting this new, unvisited cell. If at some point your stack is empty and you have not yet visited all the nodes, you repeat the above process from a node that has not been visited. This process continues until every cell has been visited.
Note that often more than one shortest presumed unblocked path exists. Different search algorithms might then move the agent along different shortest presumed unblocked paths (as seen in Figures 5 and 6), and the agent might observe different cells to be blocked. Then, the shortest presumed unblocked paths and the trajectories of the agent can start to diverge. This is okay since it is difficult to make all search algorithms find the same shortest presumed unblocked path in case there are several of them. There may also be the case that a problem is not solvable.
Build your 50 grid world environments with the above or a similar process of your preference and store them. You are encouraged to consider methods available online for maze generation. Provide a way to load and visualize the grid world environments you have generated.
Part 1 - Understanding the methods [10 points]: Read the (heuristic) search and then read the project description again. admissible and consistent h-values.
chapter in your textbook on uninformed and informed Make sure that you understand A* and the concepts of
Explain in your report why the first move of the agent for the example search problem from Figure 8 is to the east rather than the north given that the agent does not know initially which cells are blocked.
1
2
3
4
5
A
B
C
D
E
A
T
Figure 8: Second Example Search Problem
This project argues that the agent is guaranteed to reach the target if it is not separated from it by blocked cells. Give a convincing argument that the agent in finite gridworlds indeed either reaches the target or discovers that this is impossible in finite time. Prove that the number of moves of the agent until it reaches the target or discovers that this is impossible is bounded from above by the number of unblocked cells squared.
Part 2 - The Effects of Ties [15 points]: Repeated Forward A* needs to break ties to decide which cell to expand next if several cells have the same smallest f-value. It can either break ties in favor of cells with smaller g-values or in favor of cells with larger g-values. Implement and compare both versions of Repeated Forward A* with respect to their runtime or, equivalently, number of expanded cells. Explain your observations in detail, that is, explain what you observed and give a reason for the observation.
[Hint: For the implementation part, priorities can be integers rather than pairs of integers. For example, you can use c f(s) g(s) as priorities to break ties in favor of cells with larger g-values, where c is a constant larger than the largest g-value of any generated cell. For the explanation part, consider which cells both versions of Repeated Forward A* expand for the example search problem from Figure 9.]
1 2 3 4 5
A
6A
6
10
5
4
5
6
B
6
10
5
8
4
8
3
8
5
4
3
4
5
C
6
10
5
8
4
6
3
6
2
6
4
3
2
3
4
D
7
10
6
8
3
3
2
4
1
4
3
2
3
3
2
3
E
8
10
7
8
8
8
1
1
0
2
2
1
0
1
1
2 T
Figure 9: Third Example Search Problem
Part 3 - Forward vs. Backward [20 points]: Implement and compare Repeated Forward A* and Repeated Backward A* with respect to their runtime or, equivalently, number of expanded cells. Explain your observations in detail, that is, explain what you observed and give a reason for the observation. Both versions of Repeated A* should break ties among cells with the same f-value in favor of cells with larger g-values and remaining ties in an identical way, for example randomly.
Part 4 - Heuristics in the Adaptive A* [20 points]: The project argues that “the Manhattan distances are consistent in gridworlds in which the agent can move only in the four main compass directions.” Prove that this is indeed the case.
Furthermore, it is argued that “The h-values hnew (s) ... are not only admissible but also consistent.” Prove that Adaptive A* leaves initially consistent h-values consistent even if action costs can increase.
Part 5 - Heuristics in the Adaptive A* [15 points]: Implement and compare Repeated Forward A* and Adaptive A* with respect to their runtime. Explain your observations in detail, that is, explain what you observed and give a reason for the observation. Both search algorithms should break ties among cells with the same f-value in favor of cells with larger g-values and remaining ties in an identical way, for example randomly.
Part 6 - Memory Issues [10 points]: You performed all experiments in gridworlds of size 101 101 but some real-time computer games use maps whose number of cells is up to two orders of magnitude larger than that. It is then especially important to limit the amount of information that is stored per cell. For example, the tree-pointers can be implemented with only two bits per cell. Suggest additional ways to reduce the memory consumption of your implementations further. Then, calculate the amount of memory that they need to operate on gridworlds of size 1001 1001 and the largest gridworld that they can operate on within a memory limit of 4 MBytes.