cozzie said:
My idea/ goal is to find a way to first “render” the aabb’s of my big occluders
Needs a correction, since taking this literally would not work.
You can not use a bounding volume of an occluder, since the bound covers pixels the occluder does not.
You can only use a 'smaller' shape, which misses some pixels the occluder covers, to guarantee we do not cull accidentally objects which are visible.
This is important, and said problem makes it hard for example to run a occlusion system at lower resolution than the screen. Even if we use software rasterization, it's not trivial to guarantee we only render pixels which are fully covered, while it is easy to guarantee rendering all partially covered pixels ('conservative rasterization' on GPU).
The problem is for example shared edges of occluder polygons. They are not fully covered by either polygon of the edge, so we don't draw them, which causes many holes and the culling becomes almost ineffective.
To avoid the problem, we can try to ‘shrink’ to occluders, so they are smaller then the geometry, and so the potential error of single pixels is no problem in practice.
Or we can render at the same resolution as the actual screen. (which you maybe intend anyway)
Or we could work with portals instead occluders, where low resolution causes no accidental culling.
Coming back to ‘AABB does not work', this means you have to create the occluder geometry somehow. E.g. modelling a low detail box representation of your scene, manually or automated.
cozzie said:
maybe software depth buffer
Personally i did it with software rasterization on CPU.
A full depth buffer is not needed, drawing (or processing) of single pixels is not needed.
Instead we can use 'spans'. (basically a horizontal scanline defined by two points. Newly inserted spans may intersect and modify existing ones.)
Or we could use no frame buffer at all, but just polygon clipping.
But spans scale better with detail, and it's also easier to implement.
cozzie said:
So far I have a quadtree spatial division which helps quite a bit
I use a octree, containign both the occluder geometry and the AABBs of the visible gemetry.
The tree is traversed so nodes are roughly processed in front to back order.
Occluders are drawn as spans. AABBs are checked for visibilty with the spans. If they are occluded, the entire subtree can be skipped. This means no more need to draw any occluders or geomtry, so it's very work efficient compared to other solutions. It also supports dynamic occluders, although for that using BVH would be faster than octree.
Downsides are: The algorithm can't be parallelized effectively, so it's no good fit for GPU.
So i'm not really happy with it, and will look for alternatives. It's also a lot of work, so i'm not sure if i recommend a CPU solution.
Regarding modern GPU implementations, this seems good to me:
https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501