How many texture lookups is too many?

Started by
8 comments, last by Krypt0n 12 years, 6 months ago
[font="Helv"][font="Helv"]I'm making a shader to process a texture and draw it onto a quad, in order to use that as input to other shaders. Each pixel needs to sample it's eight neighbours, but the neighbours of those 8 pixels also need to be sampled. This would require 24 texture lookups per pixel, is this too many? My platform is just DirectX 10 on a normal PC.

Also, it's not like a blur filter, where you can just sample a random handful of neighbours, this needs to accurately sample all neighbours in a 5x5 area around the pixel in order to work (it's to simulate liquid flowing over a heightmap)

EDIT: Also, I'm just using point sampling, so each sample is a single lookup, there's no filtering.
[/font][/font]
Advertisement
the amount of samples that makes it run with less than the desired frame rate on your target machine. on a GTX590 at 64x64, the gpu would probably be mostly idle, on an onboard Geforce9400m at 1280x720 you might have <10fps (my rough guess).

it's in general of advantage to profile performance instead of asking for opinions about it in some forum ;)




Profiling means actually writing it first though :wink:

I just want to get some indication, (as I have literally no idea whether 24 samples per pixel is fine or ludicrous) before I possibly waste a week writing something that won't work anyway (as this is my hobby, not my job and I'm not getting paid :P )
Hobby or not you will find yourself writing code that will never see the light of day. Happens in my professional career and in hobby projects. That all goes into proof of concept and profiling. Consider it a learning experience and go for it.
Code makes the man

Profiling means actually writing it first though :wink:

I just want to get some indication, (as I have literally no idea whether 24 samples per pixel is fine or ludicrous) before I possibly waste a week writing something that won't work anyway (as this is my hobby, not my job and I'm not getting paid :P )


no need to waste a week, write a shader that sample 24times and see how the framerate is, that's the most basic thing your engine/program/game need to support if you ask this question here.


but even if not, grab the fx composer and try it out http://developer.nvidia.com/fx-composer

it all might take less time to get a proper answer this way, than waiting here for some vague performance estimates ;)




A modern GPU will obviously run a shader that requires 24 samples... the actual performance characteristics are complex though. The only way to know is to try it and measure how well it runs ;)
On a modern GPU, I'd expect it to run well. On a DX9-level GPU, I'd expect that to take at least several milliseconds on a screen-sized texture.

How well it copes with those samples depends on a lot of things... Ideally, while waiting on the texture fetches, it should switch 'threads' and process another group of pixels - it's ability to do so depends on number of pixels per batch, fetch latency, fetch coherency, size of L2 texture cache, texture format, etc... Also note that the size of caches, speed of memory and the actual fetch behaviour varies greatly between graphics cards.

As an example of hardware quirks -- some cards are designed with some fetch units dedicated to point-sampling (and vertex fetching), and other units dedicated to filtered-sampling -- on such a card you could in theory have more fetches in flight at once with a mixture of point/linear sampling :(

Are there other pixels for the GPU to switch to during latencies -- in your case, drawing a large quad should be the best case, with the GPU being able to schedule a large number of pixel groups at one time, giving it the best chances to hide latencies with good pipelining.
Are the pixels being fetched simillar to neighbouring pixels -- in your case, it sounds like many of the texels fetched by one pixel will also be used by neighbouring pixels (which are likely in the same thread group), meaning the total amount of data to be fetched into L2 is actually less than 24 per pixel per pixel-group.
Thanks for the explanations :)

I've also been reading a bit about the 'fetch4' command, which will fetch 4 samples in a 2x2 block at the same time. I've seen mixed messages about it though - it seems to originally been an ATi-card only command (I use an nVidia GTX285) but some sources say nVidia drivers now support it, others say they don't. Would I be able to use this?
Gather4 is now standard as of DX10.1.
Fetch4 is slightly different, and is available on ATI Radeon DX9 cards.
N.B. both of these are designed to be used with single-channel textures (returning 4 "red" components).
DX10.1 added the "Gather" instruction, which gathers the value from the red channel of a 2x2 block of texels. DX11 added GatherRed, GatherGreen, GatherBlue, and GatherAlpha. However since you're using a feature level 10 GPU, you can't use any of those in your shader.
for heightmap sampling, you just need one channel anyway, you can combine 4 height into one pixel/texel, simple way to emulate fetch4.

if you'd also output just height, you could save even more, as you'd sample an 6x6 area for 4 pixel, instead 5x5 for 1pixel. so, instead of 25 texture fetches/pixel -> 36/4/4texture fetches/pixel

This topic is closed to new replies.

Advertisement