OpenGL Best Mouse Picking Technique

Started by
12 comments, last by skatehumor 3 years, 2 months ago

I wanted to get some advice on what the best techniques on mouse picking would be.

To set up the context, I am building an editor for my engine, and I'm working on adding transformation gizmos to translate/scale/rotate game objects with ease. I'd like to implement some sort of mouse picking that works well with the editor objects and the transformation gizmos.

I'm using bullet as my physics engine for now and I'm aware that there's a decent solution that leverages the physics engine (i.e. casting a ray and seeing if it hits any colliders on the way, at which point you just select the object that corresponds to first object hit). I like this solution as I would only be casting rays when clicking the mouse, so it seems performant enough, but it doesn't quite itch my scratch when I consider that I would have to add a non-user-facing rigidbody/collider combo to every game object in the editor scene, even if that object won't eventually have it's own physics support. Seems like a lot of extra code to make sure that the rigidbody/collider combo doesn't persist on the object going into a game build or that it clashes with any other colliders attached to the game object.

I'm also aware that color picking is another solution (I'm starting to lean more in this direction), but it would require me to add an extra render pass in order to populate a new buffer with the color/ID data I need. I've never tried color picking so I'm not sure if I would have to perform this renderpass every frame or not, in which case (if I did) it wouldn't be too good on performance anyways.

I'd like to know if I'm going in the right direction here, or if there are any other methods for mouse picking that would be better suited for an editor-style interface.

Advertisement

skatehumor said:
To set up the context, I am building an editor for my engine, and I'm working on adding transformation gizmos to translate/scale/rotate game objects with ease.

I did this too recently, and after moving to Vulkan there was no more option to get GPU picking without some work.
So i did it with raytracing on CPU. I use the scene graph for acceleration and actually i trace only bounding boxes of objects, which is fine for now. My meshes have BVH so i can easily extend this if necessary, but because we always trace only one ray for picking, brute forcing all triangles after bounding box has a hit would be fast enough as well.

However, the interesting thing i've experienced is this: It gave me much better gizmos than before using GL picking. This is because i have now precise control of tolerances, e.g. how far the cursor can be away from some gizmo circle. I can now use different tolerances for different controls. And i also can process in order, e.g. if cursor hits both a transform arrow and a rotation circle of the gizmo, i can prefer the one which is usually harder to select.

I also can pick occluded stuff now. I can hold the cursor over a group of many objects, and all of them become highlighted. Then i use mouse wheel to scroll through this list sorted by distance, and select with a click. It's very easy to select things now which was difficult before.

So, that's some hidden advantages i did not think about before, and i'm quite happy with the solution. It's not really much more code either, and actually it feels less bloated and hacky than using GL picking. Tracing the scene has some other uses as well.

@joej That sounds like a sound enough solution. I've been slowly building a BVH for game objects within my engine for a while now but so far I only have AABB around objects. I'll definitely look into this way of doing it though. Are you using OBB as well for picking? I'd imagine those would provide a bit more accuracy.

I've thought about this in the past, and I believe a physics engine would provide me with similar functionality since most physics engines have their own scene graph implementations in order to efficiently process collisions, but as I said, I wouldn't wanna add colliders to every single thing in the scene, especially if the object doesn't need it (i.e. gizmos). Would probably be worthwhile if I have my own scene graph too, especially for these kinds of use cases.

This question might be too open ended, but how did you go about building your scene graph/BVH implementation?

I use AABBs for the first check, then OOB. I have mostly thousands of SDF primitives in the scene, and performance is really no problem for that single ray, even if it hits hundrets. But yes, accuracy is a problem. I lack ray intersection functions for most of my primitives, so picking them just from bounding box sucks.

Exact tracing of triangles surely is worth it, but if it's only for that i would not mess with BVH and see how brute force does.
If you do skinning on GPU and still want to select characters precisely, you would need to port skinning to CPU just for picking, which is maybe more effort than doing picking on GPU.
Also, if you have BVH, you need to refit it with animation. Not really a problem, but code piles up…

skatehumor said:
I've thought about this in the past, and I believe a physics engine would provide me with similar functionality since most physics engines have their own scene graph implementations in order to efficiently process collisions

Yeah, but i agree that's probably more complexity than dong RT yourself.

I use a regular tree for the scene. It's just one tree to handle all: transform hierarchy, order of CSG operations, some logical grouping from the user, and spatial subdivision. And till yet this simple approach seems to work. Adding the raytracing for bbox picking was work of an hour, at most. (though, the gizmo was a lot of work : ) )

skatehumor said:
This question might be too open ended, but how did you go about building your scene graph/BVH implementation?

My BVH is per mesh. Actually i use it only for voxelization which is a preprocessing step and does not affect the editor yet.

The scene tree is more interesting. There is no standard way to do spatial subdivision, i do this only on demand. For example i use a fluid simulator, which uses a sparse grid. When converting fluid particles to editor objects, i just replicate the hierarchy of the sparse grid (basically an octree) in the scene tree. This way i get the same spatial acceleration in the editor as well.
The scene tree computes bounding boxes for everything (so it is a BVH to be precise), which are then used to locate stuff quickly.

If i would build a city, likely my manually made hierarchy works for spatial subdivision too. Or i'll add a tool to add some spatial subdivision nodes automatically.

However, my editor is only about world generation, not gameplay logic or animation. My scope so is only about a subsystem of a complete editor for a game engine.

Awesome thanks for the tips! Currently I only use AABBs in order to do camera frustum culling and not have to even process objects outside the FOV of the camera, but starting to see more useful applications for acceleration structures in game engines. Will definitely look into using a BVH or maybe even an octree as a general acceleration structure.

JoeJ said:
Adding the raytracing for bbox picking was work of an hour, at most. (though, the gizmo was a lot of work : ) )

Definitely takes a while. Been looking into this for a few days now trying to find the most effective way of making custom gizmos. I think it's worth the effort though.

skatehumor said:
Will definitely look into using a BVH or maybe even an octree as a general acceleration structure.

I general, BVH is much more flexible. You can refit a BVH for dynamic stuff. With an octree you always need to rebuild it from scratch because it depends on a fixed global grid.

Also, using standard octree you need to put one object into multiple leaf nodes, so you get a std::vector (or similar) per node.
With BVH (or loose octree) you only have to store a pointer to the first object, and that object can point to the next. Usually such pointer chasing is still much better than dynamic allocation for many variable sized lists.

In OpenGL there have been some utility functions for years:

Both functions can convert a point from 2D-Space/Screenspace into 3D-Space/Worldspace so you are free to decide if you either want to pick 3D objects in Screenspace or turn the cursor into Worldspace and perform a bounding volume check. I now the days of GLU are gone and one shouldn't use them anymore but there are a lot of portions of these functions you could implement easily with your own editor code.

They just require your current transform (camera) matrix, the projection matrix and the viewport vector. Even if there are already excelent hints in this topic, I just wanted to mention that as well ?

@joej I too think BVHs would be best suited for this use case. From what I understand they can start to become inefficient if BVs are updated frequently and the tree starts to become unbalanced (since after all you want some sort of balanced tree guarantee otherwise I'd just be traversing the entire list of nodes on every raycast). They also suffer when nodes are very distanced and the BVs start to become large and sparse I think. In any case it'll probably be worth a shot as opposed to a full blown Octree approach.

@shaarigan I appreciate the tip! Already have a bit of code in the engine to do screen NDC → world coordinates but might be worth looking into the implementation for those functions. Every bit helps.

skatehumor said:
@joej I too think BVHs would be best suited for this use case. From what I understand they can start to become inefficient if BVs are updated frequently and the tree starts to become unbalanced (since after all you want some sort of balanced tree guarantee otherwise I'd just be traversing the entire list of nodes on every raycast). They also suffer when nodes are very distanced and the BVs start to become large and sparse I think. In any case it'll probably be worth a shot as opposed to a full blown Octree approach.

Yeah, that's all somewhat true but it rarely matters so much. It would matter if you wanted to do a ray traced game, but likely you want stuff like ‘find all objects intersecting this box’ for most, and here tree quality is not so important. I mean, what we want is better time complexity of log(n), and we get this no matter what exact tree we use. All this talk about ‘what's the absolute fastest and coolest acceleration structure’ comes form improving some details, which after that is all that's left to discuss and research further. But it is also a form of nitpicking after the largest profit has already been taken by using just ‘some tree’ (no matter how we call it).

There's infinite details, but it all depends on application and data. If i would write a paper about some new fancy BVH, likely i can beat all the previous methods. Within my specific test case and implementation : )

To give an example, there always were a lot of worries realtime RT would never work because ‘building acceleration structures is too slow’. But that was never true - the simple idea to rebuild TLAS per frame but only refit BLAS as used now in DXR was always known to solve this, even decades ago. Many people were just unaware. The cause is probably the wide adoption and success of quad and octree, which do not support refitting. Games often were about grids, so those structures became the natural first choice for game developers. There's a lot of related confusion, bad tutorials, or even the constant suggestion of ‘trees are not cache efficient and modern HW is super fast, so you are better off doing brute force’ (which is often true ofc.).

But we can make some general statements, to make the choice easy, mostly:

Octree is good if your data is grid aligned cubes, or points with no volume. E.g. Minecraft or particles.

Loose octree is good if objects have different size and volume.

BVH is a generalization over loose octree, removing the dependency on global grid (and the constraint to have exactly 8 children). Thus we can do refitting, making it friendly for realtime update. Ofc. refitting only makes sense if objects stay connected (character skeleton), but it makes less sense for particles which move in all directions independently.
We can even decide to enlarge initial bounds so they bound all animation of a character. This way we do not even need refitting - just transform the tree without any dependencies between levels of the tree - super fast tree update, but slower tracing and queries.
But we always need to rebuild the top levels, because even characters behave like particles if we look at many of them from some distance. Usually that's no problem because we don't have that many dynamic objects. This TLAS rebuild so solves the problem of ‘bad quality trees’.
The free branching factor also helps to tune for best performance and compromise between cache efficient brute force and work efficient tree. E.g. you could decide each node can have 64 children instead just 4 or 8. This means fast iteration of objects in linear memory (if they are sorted to node order), and significantly reducing the number of tree levels to fight cache misses.

Typical downsides of BVH vs. octree:

Need to visit multiple children for a point query, because child bounds may overlap.
Need to test all children if you want to process them in front to back order, e.g. raytracing or occlusion culling.
Need to iterate nodes twice when building - once to find dimension and child count for the new nodes, a second time to sort the objects in.

But those downsides vanish if we have objects of variable size and would so choose a loose octree.
I think BVH is now standard for both collision detection and ray tracing almost everywhere.

That's a lot of useful info. Thanks a ton! I've never built a BVH in code before. Only familiar with the theoretical concepts and attempting to make my first forray now.

So from my understanding what I'm basically making is an n-ary balanced tree with configurable n where leaf nodes are determined by the presence of primitives/meshes and the tree has an added intersection routine(s). And I can either separate the spatialization or build it into the scene graph, though I would prefer to keep it as a separate data structure for the sake of configurability.

In terms of rebuilding the hierarchy, I assume this would happen per frame (or at least when objects are transformed in the scene) and refitting would only occur if an object is rotated or scaled, which would also require rebuilding of the BVH.

Does this sound right?

This topic is closed to new replies.

Advertisement