Bindless textures warp access

Started by
2 comments, last by tomas.drinovsky 1 year, 11 months ago

Hello,

I have run into peculiar troubles using bindless textures approach in DX12 and I will appreciate if somebody more HW oriented could enlighten the situation for me.

So here is minimal reproducible sample:

...
int cascadeIndex = 0;
float3 mul = 0;

if(screenCoord.x % 2 == screenCoord.y % 2)
{
   mul = float3(1, 0, 0);
   cascadeIndex = 1;
}
else
{
   mul = float3(0, 1, 0);
   cascadeIndex = 2;  
}

return mul * CascadeDepthsTexture[cascadeIndex].Sample(ShadowSampler, Saturate(ScreenToUV(screenCoord)));

It should render checkerboard, where odd pixels use different texture than the even pixels.

Problem is that on several HW this is not working and either artifacts are showing or only one texture is sampled.

NVidia RTX 3060 - working, both textures are sampled

Radeon RX 5600M - not working, only one textures are sampled. When Sample instruction is exchanged for Load, the blocky artefacts are shown.

Are there some limitations on using bindless textures regarding warps? Do you have any tip what could I check?

Thanks a lot in advance.

Advertisement

You need to use NonUniformResourceIndex for that to work correctly, like this:

return mul * CascadeDepthsTexture[NonUniformResourceIndex(cascadeIndex)].Sample(ShadowSampler, Saturate(ScreenToUV(screenCoord)));

You need that whenever the index into your descriptor array could have different values within a wave. Nvidia ignores that hint because they calculate on their own, but AMD relies on it for correct results (which is why it works on Nvidia but is broken on AMD). If this is a pixel shader you don't really know how your threads map to waves, except that threads in a 2x2 quad are in the same wave. Since you're varying the index within a quad you know for sure here that the index will vary within a wave, so you definitely need NonUniformResourceIndex. In most other situations within a pixel shader you can only assume the index is uniform if it has the same value for the entire Draw (for instance, if it comes from a constant buffer) or if you do wave operations to force the index to be wave-uniform.

Keep in mind that using varying descriptor indices + NonUniformResourceIndex can make things more expensive. GPUs will typically “waterfall" when this happens, which basically means they will do N texture fetchs to handle however many different values of the index occur within a wave. In your case it means your shader will sample from both cascadeIndex == 1 and cascadeIndex == 2, and then select the appropriate result, effectively doubling your texture fetches. You could potentially avoid this by using a Texture2DArray instead of two separate texture descriptors.

Thanks a lot!!!

I have completely overlooked this hint in documentation. I'm fully aware of performance implications. The above example was just MRS to show the problem not real usage. I agree that for CSM the TextureArray makes more sense (unless different resolution for each cascade map is used).

This topic is closed to new replies.

Advertisement