D3D12 Fence and/or Resource Barriers between command queues

Graphics and GPU Programming Programming d3d12

Started by AlexandreMutel May 18, 2021 05:55 AM

4 comments, last by AlexandreMutel 2 years, 11 months ago

AlexandreMutel

1,106

Author

May 18, 2021 05:55 AM

Hello!

Reading some materials about D3D12 resource barriers, but it is unclear to me how they are working when using resource between different command queues.

Let's say we have a simple scenario of accessing read/write between command queues:

Graphics Queue perform writeable work to an UAV1
Compute Queue needs to read from UAV1 and write to UAV2
Graphics Queue perform read work from UAV2

From the D3D12 documentations about Executing and Synchronizing Command Lists - Win32 apps | Microsoft Docs

When a resource has transitioned to a writeable state on a queue, it is considered exclusively owned by that queue, and must transition to a read or COMMON state (refer to **D3D12\_RESOURCE\_STATES**) before it can be accessed by another queue.

Does it mean that at the end of step 1, we need to transition from write to read/COMMON UAV1 so that step 2 (on compute queue) can safely perform the read on UAV1? Could we do the transition of UAV1 to read/COMMON at the beginning of step 2 instead? In other words, do resource barriers work as expected between command queues?

In the following section “Synchronizing command list execution using command queue fences” of D3D12 MSDN doc, they are implying that fences should be used to synchronize between work on different command queues (e.g if compute is depending on graphics queue). But I don't get it then, if resource barriers are supposed to work between command queues, do we need fences to synchronize between command queues?

@xoofx

MJP

20,296

May 19, 2021 05:04 AM

The barrier/fence situation is pretty confusing because barriers take a “resource state" abstraction and use it to wrap up thread synchronization, cache actions, and layout changes (compression and decompression) which is what leads to some of the wacky rules you're seeing. For your UAV across multiple queues scenario, if you're dealing with a texture that wasn't created with D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS, then those rules apply and you need to make sure to transition the texture back to a readable state on the same queue that wrote to it. I believe this is required so that any necessary decompression steps can be inserted by the driver before another queue tries to read from it. Buffers and SIMULTANEOUS_ACCESS textures don't get any compression, and so you're left with the less-restrictive “one writer and multiple readers as long as you're not writing to the same memory being read from” rule.

Fences in D3D12 are really all about ordering your command buffer submissions. They tend to make more sense if you think of them as being part of the OS scheduler that determines when a command buffer can start executing, and which command buffers can be execute simultaneously on two different hardware engines*. D3D12 exposes “virtualized” queues and not real HW queues, which means it can do things like take two submissions two different D3D12 queues and “flatten” them into back-to-back serialized submissions on a single HW engine (this is also why you can't have fence in the middle of your command buffer). So you always need fences when multiple queues are involved, since the scheduler uses that to figure out what order your command buffers will run in. Barriers are really about intra-queue sync and layout changes: that last barrier to a readable state isn't really synchronizing between queues (that comes from the fence), it's just about getting the resource into the right state so that another queue can work with it.

*Note that there's a new experiemental “Hardware Accelerated GPU Scheduling” feature that moves some of these things out of the OS scheduler and onto the GPU itself

The Blog | The Book

AlexandreMutel

1,106

Author

May 19, 2021 05:41 AM

Thanks a lot for the details @mjp that makes perfectly sense.

Also, just to confirm, within a same command list/command queue, if 2 consecutive draw/dispatch don't have resource dependencies/barriers (e.g write to read), the driver can freely decide to execute these concurrently - so that barriers do have also an impact on scheduling?

Bonus question while I have you around: any plan for a “Practical Rendering & Computation with Direct3D12” book? ?

@xoofx

MJP

20,296

May 19, 2021 06:43 AM

Yes that's right, draws and dispatches can (and usually will overlap) if there's no barriers in between them. If you take a GPU capture in PIX or grab a trace in Nvidia or AMD's profilers you can see how the draws and dispatches are actually executing. PIX will show you something like this:

and Radeon GPU Profiler will show you this:

I wrote wayy too much about these things on my blog if you're ever looking for some not-so-light reading material. If you've worked with consoles you'll probably know a lot of these things already, but in parts 1 and 5 I write a bit of how D3D12 abstracts some of the GPU specifics that you're probably already familiar with.

As for a D3D12 version of the book…it's come up but lately I just can't find the time and energy to another book like that. It would be fun to do that though if I could take a few months off from my day job. ?

The Blog | The Book

AlexandreMutel

1,106

Author

May 19, 2021 07:31 PM

MJP said:
Yes that's right, draws and dispatches can (and usually will overlap) if there's no barriers in between them. If you take a GPU capture in PIX or grab a trace in Nvidia or AMD's profilers you can see how the draws and dispatches are actually executing. PIX will show you something like this:

Yep, indeed, that's what I recall from working with Direct3D11. My question was mainly to validate my understanding, while I got some comments on Twitter that were more confusing/misleading (e.g resource barriers are not about synchronization or execution)

MJP said:
I wrote wayy too much about these things on my blog if you're ever looking for some not-so-light reading material. If you've worked with consoles you'll probably know a lot of these things already, but in parts 1 and 5 I write a bit of how D3D12 abstracts some of the GPU specifics that you're probably already familiar with.

Definitely! I had already a look before coming here, but I should have a read back! Super detailed, very interesting.

MJP said:
As for a D3D12 version of the book…it's come up but lately I just can't find the time and energy to another book like that. It would be fun to do that though if I could take a few months off from my day job. ?

hehe, yeah, completely understand, would never have such a courage to write a book! ?

There are plenty of resources already on Internet that are pretty good. Though, I'm a bit surprised that all the talks about Direct3D12 (e.g at GDC, papers..etc.) came out mainly around 2016 when Direct3D12 came out, like "D3D12 & Vulkan: Lessons Learned"…etc. and I would have thought that over the following years, we would have got additional informed feedback and updates about these “lessons learned"… but I haven't seen any…

@xoofx

D3D12 Fence and/or Resource Barriers between command queues

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

D3D12 Fence and/or Resource Barriers between command queues

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines