Hi-Z usefulness in deferred pipelines

Started by
15 comments, last by Geri 1 year, 1 month ago

I'm trying to consider whether Hi-Z occlusion culling is useful for deferred pipelines or not (or to be precise useful enough).

Rendering depth prior to G-Buffer, generating mip maps and then using the results is going to have some impact (and yes, you can do that in more efficient way than doing it at the start of current frame - I know) - and practically the only pass that's gaining something from it is G-Buffer pass (assuming you don't do any further forward passes).

It won't help you with any other cameras. It surely can't help you with shadow map depth passes. It can't help with ray tracing in any way (where you already have acceleration structure - superior to ‘just hi-z’). etc.

What do you think?

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Advertisement

Forget it. Even if it would have any benefits, the juice doesnt worths the squeeze. Deferred rendering has no significant benefits with the modern hardware we have today. If you have issues with the polygon count with large resolutions, i can suggest to use simple geometric lod.

Geri said:
Deferred rendering has no significant benefits with the modern hardware we have today.

I wouldn't be so sure about this - Unreal Engine 5 uses deferred renderer on default, Unity HDRP gaves both options (and both have various pros and cons). I do use deferred renderers for quite long time in my engine (there are specific reasons why) - which is why I specifically asked the question about usefulness of Hi-Z occlusion culling for deferred renderers. It can help a bit with forward ones (reducing the amount of additional per-fragment computations - and in forward you do majority of lighting computations during that pass) - but with deferred rendering I'm trying to see some major benefit.

  • I was thinking about not drawing everything into Hi-Z buffer - but then whoever works in scene editor (further artist) needs to setup which objects are occluders - which is not what I want.
  • Using just specific objects like terrain may make some sense as long as you're in exterior parts of the world, but not in interiors (and I'd like to avoid separating interiors from exteriors - I'd prefer to have 'seamless world').
  • Creating simplified representation of scene (through voxelization and further processing (ensuring each stored voxel is inside geometry (i.e. proper occluder))), and using that could be an option (voxels could be pre-computed and use static geometry only) - but scenes can get quite big (open-world … which kind of goes against this),
  • Use previous frame data only (I know, not ideal - will result in popping) - but it is “free” to some extent (I believe UE5 might be doing this)

These are just few ideas that I got over the past day for occlusion culling (and its setup being somewhat sane for a tiny team).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Vilem Otte said:
I wouldn't be so sure about this

I am quite sure about this, because i am using deferred-like tiled rendering engine for my 2d sprite renderer for a deacade, and day by day i got reminded by just looking at the code why i shouldnt. Tho my sprite renderer should work on 100mhzish embedded systems at the end, so i must keep the technology alive no matter what. For you, its a 99% chance its not ideal.

I work at a AAA studio with a proprietary engine and a deferred renderer. We use Hi-Z depth for occlusion-culling by reprojecting the previous frame's depth (with hole-filling heuristics) to save significant CPU perf and moderate GPU perf. This of course can cause occasional pops, but they are fairly rare in normal gameplay and worth the performance benefit. We currently cull entire meshes, but could conceivably improve things by doing finer-grained culling of meshlets/LODs or even individual triangles.

There is an alternative form that we will likely implement at some point that removes all pops at the cost of a bit of extra book-keeping and complexity that doesn't use previous-frame depth at all. You instead prime the depth buffer with all previously-visible geometry (e.g. everything that passed occlusion the previous frame) and use that buffer for occlusion-culling. This does require a fully gpu-driven renderer though.

In short, a deferred renderer can definitely use Hi-Z for real-world performance wins but as always the answer is “it depends” (on your engine, number of drawn objects, the specific implementation, etc.). For example, our engine uses this depth buffer to cull a few hundred thousand objects per frame IIRC.

Valakor said:
There is an alternative form that we will likely implement at some point that removes all pops at the cost of a bit of extra book-keeping and complexity that doesn't use previous-frame depth at all. You instead prime the depth buffer with all previously-visible geometry (e.g. everything that passed occlusion the previous frame) and use that buffer for occlusion-culling. This does require a fully gpu-driven renderer though.

Good blog post: https://www.rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/
Seems quite a good solution to me.

Vilem Otte said:
It can't help with ray tracing in any way (where you already have acceleration structure - superior to ‘just hi-z’). etc.

Indirectly it can. E.g. when Battlefield V was the first RTX game it's performance was poor, and they fixed it mostly by doing SSR and using RT only for the failure cases.
Hi-Z then helps with speeding up SSR (or any other SS tracing / sampling stuff, eventually).
So besides occlusion culling i'd say SSR sees the most benefit from Hi-Z.

Vilem Otte said:
Creating simplified representation of scene (through voxelization and further processing (ensuring each stored voxel is inside geometry (i.e. proper occluder))), and using that could be an option (voxels could be pre-computed and use static geometry only) - but scenes can get quite big (open-world … which kind of goes against this),

Yeah, it's surely possible but too much work i think. Likely you'd want an eroded (or ‘conservative’?) voxelization to ensure the voxels do not cover any empty space. And for that you'd need a solid voxelization first, which methods based on rasterization can not do robustly? And after that you likely want to do a mesh reduction on the voxel model, and eventually LOD as well, which is some work too.

Two pass Hi-Z culling does not need occluder geometry, is easy to implement and efficient, and pretty robust i guess. It feels like a decades old open problem has just been solved.

Geri said:
Forget it. Even if it would have any benefits, the juice doesnt worths the squeeze. Deferred rendering has no significant benefits with the modern hardware we have today.

Modern HW has no affect on the benefit of deferred, which is to have a constant number of pixels for lighting calculations.
Visibility buffers extend the advantage to material evaluation as well, also helping against the high bandwidth cost of G-buffer generation.
What's better still depends on specific situation, but i guess it's still deferred for most cases.

Hm… To make my question a bit more specific - we are talking about scenes similar to this (quickly captured with software & testing asset pack I had at hand right now):

Top - view from interior to window; Bottom - same view but in wireframe
Top - view in exterior; Bottom - same view but in wireframe

I do also know that in case of the first scene it's massive difference when using occlusion culling, in case of second one - eh - not that much. I intentionally added wireframe views from approx. same view points. Now let me get to answers:

Valakor said:
This of course can cause occasional pops, but they are fairly rare in normal gameplay and worth the performance benefit. We currently cull entire meshes, but could conceivably improve things by doing finer-grained culling of meshlets/LODs or even individual triangles.

This doesn't sound that bad then - although at that point it becomes quite gameplay specific. Things like ‘teleporting’ around the world, travelling at higher speeds or quick turning are I guess the worst scenarios (although those can be handled by disabling occlusion culling completely for current frame.

Valakor said:
There is an alternative form that we will likely implement at some point that removes all pops at the cost of a bit of extra book-keeping and complexity that doesn't use previous-frame depth at all. You instead prime the depth buffer with all previously-visible geometry (e.g. everything that passed occlusion the previous frame) and use that buffer for occlusion-culling. This does require a fully gpu-driven renderer though.

This does sound good enough to me - I'm currently already having gpu-driven renderer (with frustum culling already done on gpu), based on sub-meshes.

JoeJ said:
Indirectly it can. E.g. when Battlefield V was the first RTX game it's performance was poor, and they fixed it mostly by doing SSR and using RT only for the failure cases. Hi-Z then helps with speeding up SSR (or any other SS tracing / sampling stuff, eventually). So besides occlusion culling i'd say SSR sees the most benefit from Hi-Z.

Hm… that sounds like a neat optimization - practically speaking one could use it for generic ray (although it will probably make sense only for reflection rays and possibly shadow rays to some extent). While I might get to that situation - it will probably already help with reflections when you turn off ray tracing (because at that point screen-space approach is used … or good old cube reflection maps).

JoeJ said:
Yeah, it's surely possible but too much work i think. Likely you'd want an eroded (or ‘conservative’?) voxelization to ensure the voxels do not cover any empty space. And for that you'd need a solid voxelization first, which methods based on rasterization can not do robustly? And after that you likely want to do a mesh reduction on the voxel model, and eventually LOD as well, which is some work too. Two pass Hi-Z culling does not need occluder geometry, is easy to implement and efficient, and pretty robust i guess. It feels like a decades old open problem has just been solved.

My thoughts were to voxelize the scene and then possibly erode voxels + check that each voxel is inside geometry (I was considering also a specific way to perform voxelization). As that is quiet a lot of computations - this would need to be precomputed. Rendering the scene into Hi-Z is just seem to be more efficient and generic in this regard (also, with additional bonus - I don't need to store any additional geometry/buffers/etc. on drive and pre-bake them). Also - standard Hi-Z could also run directly in editor (compared to pre-baked solutions).

The main problem is - whether the benefits outweigh the costs. Ray tracing uses its own acceleration structures, so this is mainly for main camera viewport rendering.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Vilem Otte said:
I do also know that in case of the first scene it's massive difference when using occlusion culling, in case of second one - eh - not that much.

Yeah it's definitely scene-specific how much benefit you'll see, but at least in our case it's a moderate win even in fairly open scenes. This is usually because the terrain does a decent job of occluding lots of medium- and far-distance geometry. Of course the absolute worst case is looking down on the world from some high elevation where almost nothing is occluded, but that doesn't tend to happen in our game or is offset by all the LODs being naturally pushed out in such situations.

Vilem Otte said:
Things like ‘teleporting’ around the world, travelling at higher speeds or quick turning are I guess the worst scenarios (although those can be handled by disabling occlusion culling completely for current frame.

Yup - we “cut” the camera in a handful of situations and ignore the occlusion buffer on the subsequent frame. We definitely get minor spikes on these frames, but it's a rare occurrence overall and not generally noticeable (because of the discontinuity of the visuals). High-speed movement and turning are definitely weak points of this technique, but again the approach works well for the type of camera we have.

Some other notes:

  • A huge advantage of this technique is that it requires no manual authoring on the part of our artists which is a real time-saver…
    • …but if we did want hand-authored occluders for some reason, they'd be easy to add by marking specific pieces of geometry as occluder-only and rendering them only in the initial depth pass.
  • We don't currently do this, but this technique can also work for shadowed lights if you can afford the additional memory and bookkeeping associated with more depth pyramids. We're probably going to implement this for our primary directional light shadow cascades.

So, I did try it and these are the results (time-wise):

Note: Please don't pay attention to full ‘Frame’ and ‘Raytracer’ pass (full path tracing comes at a cost). So, for a comparison to non-occlusion culled version:

Geometry-wise it is major difference. I currently went for a variant where I render the scene once and generate Hi-Z map (full-sized). Then I calculate frustum and occlusion culling - which builds indirect args list for main multi-sampled G-Buffer render. Few notes from me:

  • Generating Hi-Z comes at a cost of rendering scene AND calculating mipmaps with max filter
    • This cost is currently higher than rendering scene without Hi-Z at all … but:
      • Max filter mip mapping on my side is bad, it's building 1-level per cycle in compute shader (I was just too lazy to adapt my high performance mipmap generator)
      • Full resolution for Hi-Z might be overkill (this being said, going too low on size introduces popping!
      • I could always re-use depth buffer from last frame - at cost of additional popping
    • Sponza is probably a poor choice for testing this (it proves occlusion culling works, but that's it). There is not really that much to occlude at any view
  • Occlusion culling is far from perfect
    • When one moves around, the occlusion from Hi-Z just doesn't seem to be as good as I expected (comparing to pre-computation methods)
  • Culling phase is FAST
    • I expected this to slow me down, but no - compute shader does this blazingly fast
  • Using bounding boxes vs bounding spheres
    • So practically speaking culling could be improved by going away from bounding box to bounding sphere (could require less computations for culling)

Currently whole culling runs on GPU-only. The only thing done by CPU every frame is preparing draw indirect buffer (if that gets too slow, all instances can be recorded only once - and only added/deleted on demand). The rest is all on GPU (creating Hi-Z buffer, frustum and occlusion culling, rendering indirect buffer that is result of the culling).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Vilem Otte said:
doesn't seem to be as good as I expected

I tried to warn you. Scrap it before you waste another month of your life on this technology. And do not try to clone technologies just because some obsolote jokeware engine which was designed 20 years ago relies on it.

This topic is closed to new replies.

Advertisement