Unorthodox shadowing approach foolish?

Started by
4 comments, last by Thaumaturge 2 years, 5 months ago

Hello, I'm developing a 3D engine by my own. I've studied and adopted techniques like shadow mapping and deferred shading. Although the results were looking good, I wasn't satisfied at all because it felt somehow non-universal (rendering scene X times from different views, distance-dependent shadow-map dimensions, postprocessing of shadow aliases, …).

With gained experiences I started doing it "my way". I've created a three dimensional grid of small cells. From any position the related cell can be derived. Content of a cell is a list of vertex indices (polygons which are related to that respective cell). Grid table, vertex indices and vertices are bound as separate structured buffers to the pixel shader. Until yesterday I was quite confident that shadowing could be done by checking if the current pixel is hidden from light source by any polygon in between. Therefor I wanted to follow the light ray from target to source on a per-cell basis (ray-tracing like), but I had to stop even at the first cell. The frame rate drop was disastrous.

Before fetching all neighbor polygons (first grid, then indices and finally vertices) I had frame rates around 1200/500 (extrapolated max/min within one second) and afterwards around 150/60. For the whole scene there are only 21 draw calls - draw units are a multiple of mentioned cells. Alltogether there are only 1991 vertices, whereby some are even copied to ensure locality within memory. Most cells contain around 10 polygons, all with more than 15 polygons are colored white (no cell has more than 25 polygons). Vertex size is 32 bytes. GPU output is 4k. DirectX11 is used.

4 + 10 * 3 * 4 + 10 * 3 * 32 = 1084 (bytes read per pixel)

I knew that described approach would be challenging. Wanted actually to go several meters towards the sun (instead of only 0.5m) and also to all close lights. My GTX 650 TI with 2GB ram is quite old, but that outcome suprised me hard. CPU collision detection, dynamic objects in the scene, more mesh details... are even missing. Not sure, if even modern GPUs or ones in 10-20years would be able to handle so many dependent memory reads I'm planning. According to replies in https://computergraphics.stackexchange.com/questions/10685/fastest-way-to-read-only-from-buffer going that way is actually no good idea.

Thought already about smaller cells, but then the grid would need to get much larger (finally 16GB+ for 500m terrains) while the polygons would need to get smaller to still fit into the cells -> more polygons and finally same amount of checks. But in case of shadow mapping with volumetric lights and ray marching I also would have horrible much texture accesses per pixel (for each marching step against every light source)…

I don't want to believe that my approach is complete garbage. Do you have any advice for me? I've read about similar ways to implement ray tracing, so I can't be totally wrong, or?

Advertisement

So what you want is just raytracing.
I see this problem with your approach: You use a regular grid, and you have to traverse every cell along the ray, which is brute force and not work efficient.

In contrast, acceleration structures like BVH enable to skip over empty space, and they support non static geometry without a need to rebuild the grid every frame. So that's much better.

Another option of growing popularity is Signed Distance Fields. They also allow to skip empty space, but it's volume data, so a lot of memory and SDF generation is not realtime. Supporting deforming geometry like characters is thus a problem, eventually to be solved with approximations like capsules.
Advantages over classic triangle raytracing: Triangle intersection is not needed if the volumetric approximation is good enough.
Multiple, overlapping models work well because we need to process only the one with the shortest distance on ray entry. Only if distance while tracing this model starts to increase we need to check the others again. (Not 100% sure about this.) RT+BVH would need to process them all to find the closest intersection.
Tracing code also has less complexity and divergence than classic RT.

Finally, we could try to optimize for a specific application. If it's shadows, we could take advantage from the fact all rays have the same target, which is the light source. We could so trace from the light source to the scene, instead the other way around. Then all rays have the same origin and will access the same memory at least initially, which will help with performance. But wait - if all rays have the same origin, that's the same as a camera? So we could just use fast rasterization instead slow raytracing!
At this point we realize: Good old Shadow Maps are not cool, but pretty efficient :D

Why do we need raytracing for shadows then? To improve shortcomings of SM, like bad accuracy or wasted work where we never read from the SM? No - look up UE5 virtual SM to see how they solve this.
But in the real world, point lights do not exist. They have an area, and area lights give nice soft shadows. And an area light does not have a origin in form of a single point. So if this was your goal, it starts to make sense to think about alternatives.
Another reason would be a need for many shadowed lights, where SM becomes unpractical. But if we go there, we may eventually better just solve global illumination as a whole, not just shadows.

@JoeJ thanks for your reply! ?

Sure, I see the advantages of BVHs regarding easier empty space checks but on the other hand I would need to pass through a tree data structure which is pain, especially in case of dynamic updates. And also the traversal in pixel shader from root to the leaves shouldn't be underestimated if the terrain is large (fast jumps to neighbor leaves….? more and more complexity - thats was actually the reason I went with the grid). Besides that, what would be the content for static terrain leaves other then what I have with my cells? Ok, a tree is much smaller compared to a grid and so the cells could get smaller → less polygon checks, thats correct. With my approach empty space is detectable by only one grid lookup - second lookup for dynamic content (not best, but I think thats still fair).

Regarding “brute-force”: Thats correct for within a cell but it's local brute force - so valid in my view. But I can't deny, that in sum there are still too many memory reads…

SDFs look interesting but “offline generation” and “using approximations instead of real mesh shape” seems for me to be the wrong way. I have to read more about it first, may there are helpful ideas I didn't understand so far.

Yes, shadow mapping is very efficient and would be the only real alternative for me right now. But as you said, it gets unpractical with too many lights. Then I would probably have similar issues with memory reads as now.

Actually I can live very good with only parallel light (sun) and point lights (torch, …), no need for area lights. Described approach I also wanted to use for lighting in general. Shadowing is only one of the most important aspects for graphical authenticity and known by everybody.

Yeah, in short: Every solution totally sucks. Personally i can't even use DXR, because it can't work with continuous LOD. Thus i'm still interested in compute tracing myself, for reflections at least.

I remember this demo used regular grid, there was a blog post explaining details:

Then there is Cryteks compute raytracing, which afaik uses voxels, so maybe regular grid as well. Maybe a form of sparse grid to help against brute force. Results are very impressive - runs faster than the small old demo from above.
Maybe their stuff is included in open source CryEngine now (wasn't yet when the demo came out).

goll said:
Ok, a tree is much smaller compared to a grid and so the cells could get smaller → less polygon checks, thats correct. With my approach empty space is detectable by only one grid lookup - second lookup for dynamic content (not best, but I think thats still fair).

This requires some corrections:
The number of triangle intersections remains the same, but a tree needs less bounding box checks, which is the most work. Runtime complexity is O(log n) vs O(n), so the larger/more detailed the scene is, the larger the win from using a tree. Potential speedup can be orders of magnitudes quickly.
Complexity is no real problem either. We could use a stackless approach, which usually is not ideal for RT but really simple and not much more complicated than the grid.
EDIT: Supporting dynamic objects with BVH is much better as well. Becasue the BVH bounds do not rely on a global static grid, we can reorder or even just transform the tree without a need to rebuild it each frame.

The key problem with RT is the divergent memory access. If one thread maps to one ray and traverses the scene independently, we have the worst case of random memory access. Even HW acceleration can't help this.
For a solution, there is ray reordering, which tries to maintain batches of rays processing batches of scene. Really complicated, and requires to shuffle rays across workgroups close to inner loops, which can't be done efficiently using the current compute programming model. There was a NV paper where they had a good algorithm and used a GPU HW simulation to test it, but speedups were still just about two (iirc) for incoherent rays (GI, AO). For shadows it does not help much at all.
ImgTech had such method in HW for their first raytracing GPU, which came out years before RTX. Idk what NV is doing, but DXR API design hints there is no global reordering going on.

Personally i think we need to work out such reordering ideas in more detail, eventually requiring different algorithms for different applications. And we need to utilize LOD, so scene complexity can adapt to required accuracy / performance budgets.
Current RT HW / APIs allow neither, so our hands are bound and doing ‘foolish’ research just for fun is indeed the only option. Keep it it up! :D

Have you looked into the “shadow volume” technique (a-la Doom 3)? That might perhaps prove to be a useful alternative to shadow mapping--I imagine especially if your scene is fairly low-poly.

MWAHAHAHAHAHAHA!!!

My Twitter Account: @EbornIan

This topic is closed to new replies.

Advertisement