Hello everyone,
I just wanted to share a, what I believe to be novel, broadphase collision detection technique that I stumbled over while working on my physics engine. If anyone knows of related work, please refer me!
The code will be at the end of this post, but first some discussion.
Briefly, the goal of a broadphase is to determine potentially colliding pairs of objects. There exist many well known techniques such as: uniform / hierarchical grids, bounding volume hierarchies, sweep and prune, or quadtrees. They all have their respective pros and cons, which I won't go into here. One thing they all have in common however, is that they require maintaining and building a data structure which facilitates the process of finding pairs.
Divide and conquer broadphase is based on the relatively new "Divide and conquer ray-tracing" technique. The idea is to recursively subdivide space, partitioning objects into these spaces, until there are sufficiently few objects in a space. When there are few enough objects, use a brute force method to find the potential pairs.
The great thing about this technique is, it requires no data structures to be built and it uses a fixed amount of additional memory. In addition, the code is surprisingly simple.
The code shown works in 2D. It is trivially extended to 3D or beyond by changing only the partitioning method to consider the additional dimensions.
Performance analysis:
Subjectively speaking, I have tested this method against many others in my own physics engine on a variety of scenes. It performs very nicely, surpassing the dBVT in many cases (especially when many objects are in motion). It rivals the speed of my current best, the uniform grid.
From an analytical point of view, the best case is O(n log n) when the objects are not overlapping, and the worst case O (n^2) when all objects occupy the same space. However, in typical physics simulation scenes, objects are nicely distributed and do not overlap much (assuming the constraint solver is doing its job). O(n log n) doesn't sound that great considering BVHs achieve the same on paper. But unlike BVHs, no structure needs to be maintained or updated, which causes it to perform better in practice.
Memory usage, as you will see, is minimal. The algorithm is run from scratch at the beginning of a time step.
Potential optimizations:
There are a few avenues for optimization. The first and most obvious being, unroll the recursion into a loop with an explicit stack in order to reduce the overhead of the recursive calls.
Another is to use a more intricate partitioning method, rather than the simple one shown here.
Note that I use a set to track pairs, since this method can result in duplicate pairs. The set is actually rather slow. You can easily swap this for a vector, then sort and remove duplicates at the end for a nice performance boost.
There exists a parallel version of "Divide and conquer ray-tracing". I'm sure the ideas presented there could be used to parallelize this technique as well, though I have not looked into it,
Lastly, one could potentially cache the partition locations, and use those as a starting point for computing the partition locations in the next frame. This exploits the temporal coherence of physics simulations. This could theoretically divide the cost of the whole process by 2, since computing the partition location is half of the work in each call.
Here is my implementation (feel free to use this however you wish):
#include <vector>
#include <set>
#include <algorithm>
// Divide and conquer broadphase
void dac_bp(const vector<aabb>& bounds, const vector<int>::iterator start, const vector<int>::iterator end, set<pair>& pairs)
{
const int BRUTE_FORCE_THRESH = 32;
if ((end - start) < BRUTE_FORCE_THRESH)
{
// Brute force check the remaining pairs in this interval
for (auto i = start; i != end; i++)
{
for (auto j = i + 1; j != end; j++)
{
if (bounds[*i].intersects(bounds[*j]))
{
pairs.insert(pair(*i, *j));
}
}
}
}
else
{
// Compute bounds of all boxes in this interval
float2 pmin(FLT_MAX, FLT_MAX), pmax(-FLT_MAX, -FLT_MAX);
for (auto i = start; i != end; i++)
{
pmin = min(pmin, bounds[*i].min);
pmax = max(pmax, bounds[*i].max);
}
// Choose the partition axis and partition location
float2 size = pmax - pmin;
int axis = (size[1] > size[0]);
float split = (pmin[axis] + pmax[axis]) * 0.5f;
// Partition boxes left and recurse
auto mid_left = partition(start, end, [split, axis, bounds](int i) { return bounds[i].min[axis] <= split; });
dac_bp(bounds, start, mid_left, pairs);
// Partition boxes right and recurse
auto mid_right = partition(start, end, [split, axis, bounds](int i) { return bounds[i].max[axis] >= split; });
dac_bp(bounds, start, mid_right, pairs);
}
}
// USAGE
// Build a vector containing bounds of all objects
vector<aabb> bounds = { /* fill with bounds of objects */ };
// Initialize indices
vector<int> indices;
for (size_t i = 0; i < bounds.size(); i++)
indices.push_back(i);
// Get all colliding pairs!
set<pair> pairs;
dac_bp(bounds, indices.start(), indices.end(), pairs);
Thanks for reading, and I hope someone finds this of use! If you have any further ideas for extending this, please share!