Understanding the order (point) primitives are drawn in, and determinism

Started by
7 comments, last by ddlox 3 years, 5 months ago

Hi folks,

Maybe you can help me understand how the GPU works a little better? I'm drawing a few million points in a 2d app (no perspective or z-buffer).

When rendering point primitives from a vertex buffer, somehow the points get drawn in the exact order they are in the VB? So if some points overlap, you are still guaranteed the same output every time?

If instead I render the points from a vertex texture, i.e. the vertex shader fetches from a uv coord that's some function of the SV_VERTEXID, are they still in guaranteed order?

But if I use a compute shader, loading points from a StructuredBuffer and writing manually to an output UAV, I assume all bets are off?

Any thoughts on why one of these 3 techniques should be fastest?

Thank you!

Advertisement

Glen.Able said:
somehow the points get drawn in the exact order they are in the VB?

yes, vb is an ordered stream sequence of verts

Glen.Able said:
So if some points overlap, you are still guaranteed the same output every time?

correct, if point P which is at vbuffer position7 and transformed by matrix M is repeated at position 50 in the same vbuffer and transformed by the same M, then yes it will have the same output; (that's why, for example, in OGL, if its internals can detect that some input is going to be the same output, it reuses the previous result instead of wasting time by running the same vshader on the same input)

Glen.Able said:
. the vertex shader fetches from a uv coord that's some function of the SV_VERTEXID, are they still in guaranteed order?

that's a good question actually… i don't know if u can use SVVERTEXID with VTFs (vertex texture fetches). Also, using SV_VERTEXID is not what guarantees order (this System Value only tells the IA stage to generate an ID for each vert). I don't know if vt fetches define a fetching order or if those fetches are ordered in the same way that are similar to the TIU (Texture Input Units) or Pixel Shader (texture fetches).

My guess would be that the fetching order here is guided by the texture coord generator (by the interpolator)…there's more to be said here…

Glen.Able said:
…I assume all bets are off?

-lol- well really it depends on various factors, if for example, you're really using the UAV for RW by multiple threads etc… then…

Glen.Able said:
Any thoughts on why one of these 3 techniques should be fastest?

(respectfully) i think that's not the right question to ask, they all have their use AND pros and cons. The worst u can do is to use anyone of them the “unrecommended” way, right?

Good questions !

A+ ?

Thanks for your thoughts!

The only implementation I've tried so far used the vertex id + vertex texture fetches approach like this…it worked fine:

VS_TO_PS VS(uint vid : SV_VERTEXID) {

// generate tex coords from vertex index

float u = (vid & 2047) / 2048.0f + 0.5f / 2048.0f;

float v = (vid >> 11) / 2048.0f + 0.5f / 2048.0f;

float4 pointPos = texSrc.SampleLevel(samp0, float2(u, v), 0);

}

I never noticed any of the flickering you might expect if it's not running deterministically (i.e. sometimes at some output pixel, point A gets rendered before point B, sometimes vice-versa) But once I started thinking about a compute shader implementation, that possibility occurred to me.

As you said, maybe not the right question, not least because rendering the points probably isn't the slow bit, and you can measure speed easily enough anyway!

ok, however with compute shaders, u could group-synch threads with barrier to synch vtf calculation outputs, it's a cheap op, but adds constraints to the warp scheduler (and could cause stalls) ; so then again back again to those pros and cons and trade-offs ?

anyway thanks for sharing ?

To answer your first question: graphics API's like D3D guarantee that primitives have their pixels written to the render target(s) in the order that those primitives are submitted. So in your “standard” VS + PS rendering scenarios this ordering comes from the order of indices in your index buffer (or the order of your vertex buffer, if not using an index buffer). If you're using a geometry shader or tesselation then you're effectively generating new primitives, but the ordering is still well-defined and is derived from the order of your original index buffer. This generally lets you have a mental model where each primitive is processed one at a time, going all the way from VS → rasterizer → PS → render target + depth/stencil buffer in a serial process. In reality things are more complicated then that…there's actually a lot of parallel processing and overlapping/pipelined stages happening, even within a single Draw call or a single primitive. But the pipeline is setup in such a way that the render target writes still ultimately happen in the order of primitive submission, even if previous stages don't necessarily execute in that order.

Once you start dealing with direct writes to buffers/textures through unordered access views, you are correct in assuming that all bets are off (this is why they're called “unordered”, since there are no ordering guarantees within a single Draw or Dispatch). You need to rely on other means to get things in the right order that you want, or rely on algorithms that aren't aren't sensitive to the order of the writes (atomics are popular for this sort of thing). The one exception is the relatively-new feature called “rasterizer order views”, or ROVs for short: when you use them, they ensure that writes to a buffer or texture from a pixel shader are primitive-ordered for any threads that execute for the same pixel coordinate. The main use cases are OIT algorithms, and voxelization.

@mjp

well that's odd, in that link it states:

“This enables Order Independent Transparency (OIT) algorithms to work, which give much better rendering results when multiple transparent objects are in line with each other in a view.”

OIT works well and gives good transparency results without ROVs, so this statement is unclear… in other words, the authors of OIT did not need a ROV when they had a breakthrough… so why did anyone think that ROV was needed for OIT??

what am I missing ?

There are many variants of OIT: it's really a family of techniques, rather than just one. Some require specific pixel shader ordering, and some do not. This causes these techniques to have different tradeoffs in terms of performance, quality, and memory usage. Look up adaptive OIT for an example of an algorithm that was specifically designed to make use of the ordering guarantees provided by ROV's

aaaah right ok, @mjp makes perfect sense now;

thanks for enlightning ?

This topic is closed to new replies.

Advertisement