I was wondering if there is a speed penalty for using a append or consume buffer instead of a structured buffer? How are they implemented? I don't really have any experience with them and was basically wondering if I should design around a structured buffer with an extra pass to get rid of "empty entries" or just use an append buffer?
append/consume buffers, speed penalty?
They're basically the same as having one structured buffer, plus one RW buffer that contains a single integer. When you append into the buffer you're actually doing an atomic increment on that integer to find out which index to write into the structured buffer.
. 22 Racing Series .
The dx11 version usually implement the counter in the global data storage, it is like the local data storage of your computes, limited to 64K. (You can access it with cuda/opencl i believe but not direct x).
The dx12 version is explicit, you create the counter buffer yourself, so in theory, no GDS for you, but the driver is still free to sneak in between, but who knows if it does it or not
11 hours ago, galop1n said:The dx12 version is explicit, you create the counter buffer yourself, so in theory, no GDS for you, but the driver is still free to sneak in between, but who knows if it does it or not
Is there a resource where I can read up on this? Is this SM5.1/6.0? I have a resource describing the basics of the dx11 version.
-potential energy is easily made kinetic-
15 minutes ago, Infinisearch said:Is there a resource where I can read up on this? Is this SM5.1/6.0? I have a resource describing the basics of the dx11 version.
This is shader model independent and the only documentation you need is CreateUnorderedAccessView : https://msdn.microsoft.com/en-us/library/windows/desktop/dn788674(v=vs.85).aspx