Advertisement

Pathfinding A* with multiple agents

Started by April 17, 2013 04:06 PM
4 comments, last by Emergent 11 years, 5 months ago

I have to implement path finding on GPU as a university project. I am going to use A* (I have used it before for solving 8-puzzles)

However, I can't think of a way to make it efficient for a single agent. I probably have to simulate multiple agents to get reasonable performance speed-up.

I read somewhere that I should spawn a thread for each agent, which seems reasonable.

However, how would the agents avoid each other? The world (2D map) would have to be updated each time one moves (this means that I must have a "central" array where the changes will be written to, which will make performance suffer on a GPU (or not?)). I know this is an AI forum, so the question I am asking is, how can I update each agents position with the least number of updates?

However, I can't think of a way to make it efficient for a single agent.

What do you mean? In terms of the final path generated, A* is perfectly efficient. Are you talking about the problem of stating the algorithm in terms amenable to the GPU?

I read somewhere that I should spawn a thread for each agent, which seems reasonable.

No, for most applications it's not a reasonable approach and is likely to cause far more problems than it will solve, especially given that you are doing GPU work.

On the whole this sounds like a very difficult problem. Recalculating paths to avoid other agents is non-trivial and has spawned several different variations on A* to make it work. A* itself is probably not very suited to GPU acceleration as it's an algorithm that has a lot of stages that depend on reading previous writes. Attempting to solve both these issues in one project is likely to be quite challenging.

Advertisement

In my experience, for actual game use, I've found that updating the A* search to account for other units is a bad idea. Doing it in the simple way leads to way too much global information being broadcasted to other agents. You might be able to do some shenanigans to make it more localized, but then you might as well be doing some sort of steering behavior to make smaller local adjustments anyways.

Not sure how much I can help you here with the GPU side of things, but I always use A* just to get the general path and let the agent's steering behaviors take care of the rest. Occasionally, I use information I've obtained through local observations to help me generate a different path if the one I'm currently using seems bad.

What do you mean? In terms of the final path generated, A* is perfectly efficient. Are you talking about the problem of stating the algorithm in terms amenable to the GPU?

Yes. I meant spawn a GPU thread for calculating the path for each agent. Then it would be necessary to "inform" the other agents (write to GPU global memory). Or could this be avoided sometimes? (read below)

I read somewhere that I should spawn a thread for each agent, which seems reasonable.

No, for most applications it's not a reasonable approach and is likely to cause far more problems than it will solve, especially given that you are doing GPU work.

On the whole this sounds like a very difficult problem. Recalculating paths to avoid other agents is non-trivial and has spawned several different variations on A* to make it work. A* itself is probably not very suited to GPU acceleration as it's an algorithm that has a lot of stages that depend on reading previous writes. Attempting to solve both these issues in one project is likely to be quite challenging.

I see. I have talked with the professor and he has told me that it will be enough if I do some measurements (see how it scales with varying number of agents etc) of an already made implementation (there is one on the NVidia website).

If I don't manage to get the ready implementation to work, I will make one of my own, even if the performance is going to be bad - I know it's going to be hard to beat the CPU here. I was thinking of some "tricks" to reduce GPU communication, like an agent not sending any data if it's guaranteed that no other agent will be at a cell at a certain time.

(for example, if there are two agents, the grid is 10x10 and each of them is at the opposite site, there is no chance that after two steps two agents will be at the same cell).

As I recall the GPUs still make you do a lockstep instruction sequence of 32 data sets in parallel.

?? Partial patrallel processing for the step you find adjacents to add to the Open List ?? in 2D thats 8 on a regular grid and 26 candidates for a 3D regular grid

You may be able (for a single Agent thread) to employ a HeapQ which will keep your Open list tree best candidates in sequential order in the same memory space. (to be processed in lumps in parallel) the adding back to the HeapQ unfortunately would have to be sequential

A significant part of the A* processing will still be irregular (and lower the pure use of the parallel nature of the GPU alot)

You more definitely could have A* paths being calculated in parallel for Multiple Independant Objects (or multiple targets being considered for the same Agent/object)

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

There is a lot that you can do to parallelize pathfinding even for a single agent.

I would start by considering more basic algorithms than A*, and only incorporating some of A*'s features (prioritized sweeping, a heuristic) as you understand how to do the more basic algorithms in parallel.

Conceptual Starting Point #1: Breadth-first search

A* is basically a souped-up breadth-first search. You can do BFS in parallel: Expand all the nodes in a ply in parallel, and synchronize between plys. Can you do that on the GPU?

Conceptual starting point #2: Value Iteration

Value Iteration is more general than A* and very easy to parallelize. In the case of the single-source shortest path problem, it reduces to the following. A value d(v) is associated to each vertex v; it represents the distance to the goal. It is initialized,

d(goal ) = 0

d(v) = Infinity for all v != goal

Then, you simply do the following:

d(v) := minu in neighbors(v) [ edge_cost(v,u) + d(u) ]

E.g., if vertex v has three neighbors, u1, u2, and u3, then it looks at the three quantities,

edge_cost(v, u1) + d(u1)

edge_cost(v, u2) + d(u2)

edge_cost(v, u3) + d(u3)

and picks whichever one is smallest.

This minimization can be performed in parallel by all vertices. To do it, a vertex needs only to look at its neighbors' values. This can be done either in a "Jacobi style" (in which new values are written to a separate buffer, and buffers are swapped at the end of the iteration -- this is the classic Value Iteration), or in a "Gauss-Seidel style" (in which new values are written "in-place" back to the original buffer -- sometimes called "Asynchronous dynamic programming"). The point is that you have a great deal of flexibility in the order in which you do these updates, while still guaranteeing convergence. Dijkstra's algorithm and A* are "just" very, very careful optimizations of this basic iteration, first to avoid updating nodes that don't need updates, and then to bias the updates in the direction of the goal (while making certain guarantees).

Detail: Bidirectional search

Shortest path problems can be solved more quickly by bidirectional search. At the very least, you can perform these two searches in parallel (though this is more of "cpu-level parallelism" than "gpu-level parallelism").

This topic is closed to new replies.

Advertisement