Assignment 1 Search Solution

Starting from:

~~$35~~

$29

Home

• Environments and Algorithms

1.1 Generating Environment and Path Planning

Throughout the report and the code, the following conventions have been used:

1. ‘Dimension X Dimension’ is the size of the maze, referred to as an array

2. ‘P’ refers to the probability of each block to be blocked i.e. occupied

3. ‘Source’ refers to the starting point i.e. rst block of the maze

4. ‘Destination’ is the goal i.e. the last block of the maze

The image below shows a sample maze generated by our code. Its representation can be explained as under.

Figure 1: Sample Maze (dimension = 10, p = 0.2)

The interpretation of the same is as under:

1

1. The start cell has been represented with ‘Red’ colour (# 0000)

2. The end/destination/goal cell is represented with ‘Green’ colour (#228b22)

3. Obstacles or occupied nodes are represented with ‘Black’ colour (#000000)

4. Visited nodes while the algorithm is executed are represented in ‘Silver’ colour (#C0C0C0)

5. The unvisited nodes while the algorithm is executed are represented in ‘Gold’ colour (#FFD700)

6. The nal path chosen by the algorithm is represented in ‘Blue’ colour (#00008B)

• Analysis and Comparison

2.1 Find a map size (dim) that is large enough to pro-duce maps that require some work to solve, but small enough that you can run each algorithm multiple times for a range of possible p values. How did you pick a dim?

Figure 2: Output of DFS (L) vs BFS (R)

Figure 2 is the tabulation of the time taken (in ms) for a range of ’p’ and ’d’ values, for DFS and BFS respectively. Since for p>0.4, most mazes become unsolvable, we re restricting ourselves to p-values<0.4. Factors to consider for Picking a Dim:

1. Our Dim needs to be large enough that, there should be more than one solution(paths) that enables us to visualise the di erence between results given by various algorithms and compare them.

2

2. Our Dim needs to be large enough that the expected time taken (for few randomly generated mazes) the solutions signi cantly di ers across algorithms

3. Dim needs to be small enough that we don’t run out of memory. Based on the tabulation above, we can estimate approximately the time taken and accordingly decide.

2.2 For p=0.2, generate a solvable map, and show the paths returned for each algorithm. Do the results make sense? ASCII printouts are ne, but good visu-alizations are a bonus.

Figure 3: Output of DFS (dimension = 40, probability = 0.2)

The following can be observed from the results generated post the imple-mentation of the algorithms:

1. BFS always returns the optimal shortest path, by exploring more nodes as compared to DFS.

2. DFS returns the non-optimal path. DFS doesn’t necessarily give the short-est path, especially as the dimension of the maze increases.

3. Both A* Manhattan and A* Euclidean explore the nodes layer wise by exploiting the heuristics. For this particular maze, we can see that A-star

3

Figure 4: Output of BFS (dimension = 40, probability = 0.2)

Euclidean reaches the solution with lesser number of nodes, although not guaranteeing the shortest path.

4. We can see that Bi-directional BFS is faster than the BFS. We can observe that there are more unvisited nodes in both the directions which enables it to reach the solution faster. In Bidirectional BFS, there are more unvisited nodes, in both directions, as compared to BFS.

4

Figure 5: Output of A* Euclidean (dimension = 40, probability = 0.2)

Figure 6: Output of A* Manhattan (dimension = 40, probability = 0.2)

5

Figure 7: Output of Bidirectional BFS (dimension = 40, probability = 0.2)

6

2.3 Given dim, how does maze-solvability depend on p? For a range of p values, estimate the probability that a maze will be solvable by generating multiple mazes and checking them for solvability. What is the best algorithm to use here? Plot density vs solvability, and try to identify as accurately as you can the threshold p0 where for p < p0, most mazes are solvable, but p > p0, most mazes are not solvable.

Table 8 has the Solvability for a range of P-Values given a Dim value. In order to calculate the solvability randomly 100 mazes were generated and the ratio of solvable to total mazes is reported as solvability of the maze. Figure 9 is the

Figure 8: Table: Maze P-value VS Solvability for a given dim value

Maze P-value VS Solvability for a given dim value, plotted for a range of dim values from 5 to 30. For mazes with dimensions higher than 5 , we can observe that most mazes are solvable only until a p-value of 0.4. At 0.4, the solvability of the mazes touches zero and remains the same as the P-value further increases.

Best Algorithm to use here: In order to check connectivity between two points in a graph/Tree, DFS is better than BFS in most of the cases. DFS keeps going until the bottom and terminates if there is a path, whereas BFS spends time exploring all the children at a level in order to nd the most optimal path. So to just check for solvability, DFS would be a better approach.

7

Figure 9: Graph: Maze P-value VS Solvability for a given dim value

2.4 For p in [0,p0] as above, estimate the average or ex-pected length of the shortest path from start to goal. You may discard unsolvable maps. Plot density vs ex-pected shortest path length. What algorithm is most useful here?

Below given is the tabulation of mean path length(for 20 randomly generated solvable mazes) across a range of P X D values.

NOTE - Please note that 0 in the below table means that the code couldn’t generate a solvable maze in xed number of iterations. Due to computational limitations, we had a stop point at 20 tries to generate a Solvable maze.

It can be observed that BFS always gives a shorter expected path for all Mazes

Figure 10: Path Length for BFS (L) vs. DFS (R)

within the threshold Po as compared to DFS.

8

Figure 11: Path Length for A-star Euclidean (L) vs. A-star Manhattan (R)

Since the Backend of Both A-star Euclidean Manhattan is BFS, ( the heuristic is use only in deciding the precedence of neighbors to be visited), the expected path length would be the same as BFS.

As Bi-directional BFS is going to give the same path as BFS, we haven’t tabu-lated it here. From Figure 12 for DFS We can infer that as the P-value increases the Optimal path produced becomes longer. This is much clearly visible for higher dimensions.

Figure 12: Density vs. Expected Path Length for DFS

9

2.5 Is one heuristic uniformly better than the other for running A-star? How can they be compared? Plot the relevant data and justify your conclusions.

Heuristics that give better estimate i.e. closest to actual path usually have a better performance as compared to others. According to us, A*- with Manhat-tan distance heuristic is better than A-star with Euclidean distance heuristic in terms of the number of nodes visited. As the nature of the two heuristics go, Euclidean heuristic is more suited for continuous space as contrasted to the Manhattan heuristic which is more suited to the nature of the maze.

To justify, an analysis is made using the numbers of nodes visited by each algo-rithm in gure 14. From these results, it can be clearly noticed that A* Man-hattan is exhaustive in node exploration as compared to Euclidean. The path length for both the algorithms is the same as indicated in Figure 13 thus, it can’t be used as a metric for comparison of the two heuristic.

With respect to the time taken, we don’t see a pattern as there’s a variation across dimensionality and p-values.

Figure 13: Path Length in Euclidean (L) vs. Manhattan (R)

10

Figure 14: Number of nodes explored in Euclidean (L) vs. Manhattan (R)

Figure 15: Time Taken in Euclidean (L) vs. Manhattan (R)

Figure 16: A* Euclidean (L) and A* Manhattan (R)

2.6 Do these algorithms behave as they should?

Yes, the algorithms behave as they should. As expected, DFS is taking a longer non-optimal path, giving a not necessarily shorter solution and takes less time.

11

BFS is giving the shortest path. BFS comparatively visits more number of nodes and takes more time.

The A-star is making use of Heuristic (both euclidean and Manhattan) for exploration and traversal, covering way lesser number of nodes as compared to BFS. The A*-Manhattan visits lesser number of nodes than Euclidean, and returns optimal short path while taking the least time. Whereas, A*-Euclidean visits more nodes and the returns optimal shortest path by consuming more time.

The Bi-directional BFS is essentially two BFS in action simultaneously from two opposite ends. It covers far nodes than A-star algorithms. It converges to a solution faster than BFS. Also, there are signi cantly quite a few nodes that remain unvisited in Bidirectional BFS as compared to the regular BFS.

2.7 For DFS, can you improve the performance of the al-gorithm by choosing what order to load the neighbor-ing

The performance of DFS can be improved by choosing what order to load the neighbors into the fringe. DFS uses stack data structure for development of the fringe. Stack is a Last In First Out (LIFO) data structure thus, if we rst push the actions that takes us away from the goal and then push the actions that takes us closer to the goal, the total cost of algorithm can be decreased. Thus, the order of neighbors visited DFS matters.

2.8 On the same map, are there ever nodes that BD-BFS expands that A doesn’t? Why or why not? Give an example, and justify.

Bi-directional BFS (BD-BFS) is essentially simultaneous BFS from two opposite ends. The inherent nature of the BFS makes it expand all the nodes reachable for each layer. Thus, BFS tends to visit each node that is reachable in a given map.

A-star Algorithm follows a heuristic based approach where only those nodes are expanded in which the ‘f’ function is admissible and optimistic. The whole idea is to nd the shortest path in the least amount of time by the guidance of heuristics i.e. it won’t expand the nodes for which the heuristic isn’t convincing enough. Thus, the objective of A-star isn’t to visit all the reachable nodes, unlike BFS.

Figures 17 and 18 describe the number of nodes visited during path nding for both BD-BFS and A-star, independently. Here, the visited nodes are indicated by ‘silver’ and ‘pink’ (in BD-BFS) colours whereas the non-visited nodes are indicated in ‘yellow’ colour.

The span of silver and pink colours is clearly dominant over the span of silver colour in the images of A* algorithm implementation.

12

Figure 17: A* Euclidean (L) and A* Manhattan (R)

Figure 18: Bi-directional BFS

Thus, it can be concluded that BD-BFS expands the nodes that A-star doesn’t. This can also be justi ed using the graph below which is a plot between the number of nodes visited in the two algorithms.

2.9 Bonus: How does the threshold probability p0 depend on dim? Be as precise as you can.

The threshold probability ‘p0’ relates to dimension via ‘solvability’. Solvability is de ned as the ratio of the number of solvable mazes to the total number of mazes. In the 3rd sub-part of Question 2, we have plotted density vs. solvability graph to obtain ‘p0’. We found the threshold probability to be 0.4 for n=100. Let’s assume that we generate only 10 mazes with p=0.4 and if all of them are

13

solvable, it changes ‘p0’,clearly indicating the relationship between the two.

Thus, ‘p0’ relates to dimension via solvability.

• Generating Hard Mazes

3.1 What local search algorithm did you pick, and why? How are you representing the maze/environment to be able to utilize this search algorithm? What design choices did you have to make to make to apply this search algorithm to this problem

Out of the many local search algorithms available for use, we chose ’Simulated Annealing’ as our algorithm. This algorithm performs signi cantly better than the ’Hill Climbing’ algorithm which has the aw of getting stuck in a local maxima, while not discovering the global maxima. This problem is avoided by introducing randomness in the algorithm.

3.2 Unlike the problem of solving the maze, for which the ‘goal’ is well-de ned, it is di cult to know if you have constructed the ‘hardest’ maze. What kind of termi-nation conditions can you apply here to generate hard if not the hardest maze? What kind of shortcomings or advantages do you anticipate from your approach?

Our planned algorithm terminates on 2 conditions:

1. Temperature falls below the threshold value

2. Inability to generate solvable mazes even after 5000 iterations

The advantages associated with this method is that on decreasing the temper-ature slowly enough, the optimal state is reached. The nearness to the global optimum increases as the number of iterations performed increases. However, this is also a potential disadvantage as the algorithm has to be run for a long time to generate the global optima.

In order to make sure we generate a harder maze we can do a BFS and look at the length of the optimal path generated and compare it against the expected path length for that combination of probability-dimension mazes (which we have already tabulated in the previous question).

14

• What If The Maze Were On Fire?

This problem describes a dynamic maze scenario. Till now, in the previous problems, we had a map for the maze and the algorithm spent some time com-puting best possible path.

Here, the idea is to design a solution to incorporate the dynamic nature of the maze as the re spreads to the tiles at every turn.

Thus, the preliminary change is the introduction of three states as opposed to the initial two state ideology. Thus, the rst step is to visualize the re spreading by incorporating the probability factor for the spread of the re. This visual-ization helps us to infer that the re spreads more rapidly than the movement of the pointer in the maze to reach the goal from the start point.

As discussed, we have three factors to take into account, our aim is to stay at a safer spot at given instant of time. A safer spot would correspond to nearness from the goal and the maximum distance from the re. Thus, the heuristic is based on distance to the solution and distance from the re. At the same time since the re is spreading rapidly, we need to nd the most optimal and shortest path. Thus, the proposed method is a mixture of DFS combined with a heuristic.

After analysis we can infer that the algorithm is greedy in its nature and tends to be caught up in re at times instead of just arriving at the goal.

15