Advertisement

DX11 Render Target Picking

Started by August 10, 2019 07:35 PM
11 comments, last by binder2 5 years, 5 months ago

I'm using Directx11 (SlimDx C#) to write an editor type thing (like Maya). As there will be a lot of complex objects on screen, I figured I would do the picking in the following manner.

- Have all shaders write to two render targets, the first being the normal colour target, and the second being a R32_UInt texture to store pixel IDs (object ids)

- Then have this second texture read by the CPU and copied to a map for easy picking via cursor coords

 

The problem I am having is that it is rather slow.

At the end of each frame, I use CopyResource to copy the picking texture (BindFlags.RenderTarget) to a 'ResourceUsage.Staging' texture

Then I call MapSubresource, and copy it all to a simple array of uints (which are used for the actual picking detection).

 

It takes about 10ms though :( .... is there a faster way? .... The Copy call appears to be ok, it is the mapping/Unmapping calls that take the time.

 

My code looks something like this :


 void EndOfRenderFrame()
        {
            // copy the picking rendertarget to the staging texture (as we cant read from a rendertarget)
            D3DDevice.ImmediateContext.CopyResource(m_pickTexture, m_pickStagingTexture);

            // now copy the staging texture to our local copy of the picking buffer
            DataBox map = D3DDevice.ImmediateContext.MapSubresource(m_pickStagingTexture, 0, MapMode.Read, MapFlags.None);

            int destinationPitch = m_pickStagingTexture.Description.Width;
            for (int i = 0; i < m_pickStagingTexture.Description.Height; ++i)
            {
                Copy(map.Data.DataPointer, i * (map.RowPitch / 4), m_pickMap.Data, i * destinationPitch, destinationPitch);
            }

            D3DDevice.ImmediateContext.UnmapSubresource(m_pickStagingTexture, 0);
        }

        //--------------------------------------------------------------------------
        public static void Copy(IntPtr source, int sourceOffset, uint[] destination, int destinationOffset, int length)
        {
            unsafe
            {
                uint* sourcePtr = ((uint*)source) + sourceOffset;
                for (int i = destinationOffset; i < destinationOffset + length; ++i)
                {
                    destination[i] = *sourcePtr++;
                }
            }
        }

 

 

The problem is using MapMode.Read to try reading from the current frame's resource. This will flush the GPU and wait until it finished rendering your current frame and copied the resource. If you want to read from a GPU resource without introducing a CPU-GPU sync, you should read from a resource that you know is already finished on the GPU. There is a flag you can supply to the Map function: D3D11_MAP_FLAG_DO_NOT_WAIT. This will make the function immediately return an error code DXGI_ERROR_WAS_STILL_DRAWING if the GPU is not yet finished with the resource. Instead of immediately trying to read from the resource, you could double buffer the resource and read from the previous frame's resource to avoid stalling the CPU.

Advertisement

As the above poster says, you're basically telling it to copy and then immediately forcing a full stall until that copy completes. Ideally, set up a rotating buffer of three read targets, and always map the oldest one. Your pick results will be slightly out of date, but you won't stall.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

Thanks, I will try that.

One question though, I enabled MapFlags.DoNotWait, but do you know how I detect the DXGI_ERROR_WAS_STILL_DRAWING in SlimDx? as there is no HResult like with c++?

I test for DataBox.Data.CanRead, but that is always true.

 

 

 

Unfortunately I can't help with SlimDX, but if you defer reading from the resource by more frames than your backbuffer count, you can be sure that the GPU was finished with it.

Ok - I have 3 rotating buffers in place and it does improve things. Although I am noticing a lot of spikes now.

My frame looks something like this :
 


void Frame()
{
	Render();
  	D3DDevice.ImmediateContext.Flush();
    D3DImage.Invalidate(); // (I am using D3DImage so I can render to a WPF control)
}

 

With picking disabled, my frame time is a steady 4ms.

With picking enabled and no rotating buffers, it's a steady 11ms

With picking enabled and 3 rotating buffers, it's varies between 4ms -> 8ms. Is that a sign something is wrong?

 

Advertisement
6 hours ago, turanszkij said:

but if you defer reading from the resource by more frames than your backbuffer count, you can be sure that the GPU was finished with it.

Back buffer count isn't the right value here, you're looking for maximum frame latency, which is the number of frames that can be queued to the GPU before the CPU starts waiting. This defaults to 3.

15 hours ago, SoldierOfLight said:

Back buffer count isn't the right value here, you're looking for maximum frame latency, which is the number of frames that can be queued to the GPU before the CPU starts waiting. This defaults to 3.

Interesting, and what happens when you have 2 back buffers and maximum frame latency of 3? Doesn't in this case 2 the maximum number of frames that can be queued up? Would it make sense to take the minimum of [maximum frame latency and backbuffer count] to determine how long we need to wait for a resource to be finished on the GPU? 

Why not just do picking against the geometry, I think it'll be alot faster than 4-10ms :)

Picking is probably not something you need to have done 100s of times per second, so does it "matter" to optimise it with the GPU roundabout?

.:vinterberg:.

Since the OP probably only cares about the data that's under the mouse cursor it would make a lot more sense to optimise the process around that knowledge. If I were implementing GPU-picking with a CPU read-back I would:

  1. Provide the GPU with the mouse coordinate XY via a constant buffer.
  2. Bind an Append Buffer UAV to the Pixel Shader stage with enough space for holding as many 'picks' under the mouse cursor as you might ever expect to have due to overdraw. Something in the region of 32 sounds reasonable.
  3. In the Pixel Shader, check if SV_POSITION.xy == MouseXY and then simply append a pair of values (Depth and Object ID) to the list. Be sure to add [earlydepthstencil] to the shader so you don't lose the EarlyZ depth testing optimisation.
  4. CopyResource the ~32 values to the Staging resource.
  5. Map that small buffer instead and find the one with the object ID with the lowest depth value. Optionally sort them by depth and you have yourself a sorted list of hits under the cursor.

This approach has a number of advantages:

  1. Memory - you don't need an entire extra render target to hold object IDs for all the pixels you don't care about.
  2. Speed (probably, profile it) - you're not writing out this entire extra render target's worth of data and you're only copying back about ~256 bytes of data over the slow PCI-E bus as opposed to 8-32MB.
  3. Flexibility - if at a later date you decide you want to visual picks behind the front-most object you have a full list of all picks on all objects that lie under the mouse cursor rather than just the top-most one.

Using your method, you might also want to think about using CopySubresourceRegion to copy back just the one texel you're interested in rather than all of the texels.

PCI-E 3.0 at 16x has a peak bandwidth of 15.75GB/s. If you're copying back a 2160p R32 surface (32MB) that's going to take at least 2ms for the GPU to complete - so the less data you can transfer back the better. 

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

This topic is closed to new replies.

Advertisement