Advertisement

Constant buffer UpdateSubResource vs Map

Started by March 30, 2012 05:52 PM
4 comments, last by satchmo 12 years, 9 months ago
I'm trying to determine the use cases for Map/Unmap versus UpdateSubResource when it comes to updating constant buffers. I know there have been a few different threads around this topic, but none of them have really satisfactorily answered the question.

Here's how I understand Map() and UpdateSubResource() - someone please correct me if I'm wrong about something:

  • Map(): You get a pointer directly to the driver's memory for that resource. You can write directly to it without incurring any other system memory copies, possibly causing a stall if you use _DISCARD while the resource is in use by the GPU and the driver runs out of temporary buffers. Good for gathering inputs from various memory locations and writing directly into the buffer without incurring any other extra copy.
  • UpdateSubResource(): Copies the whole structure from an area of system memory to the driver's memory for that resource. Useful when you already have a copy of the buffer in memory already.


    So based on this understanding, I have a few questions:

    1. What's the general limit of temporary buffers that are used when using _DISCARD with Map()? Tens? Hundreds? I assume it's different per driver, but would like at least some general guidelines.
    2. What happens when you use UpdateSubResource() and the resource is being used by the GPU? Is it subject to the same process of using temporary buffers and possibly stalling if the drivers runs out of temporary buffers? Take an example of five objects drawn one after another, using the same shared constant buffer, and the constant buffer needs to be updated for each object. Using Map() for each constant buffer update, every update after the first one will cause the driver to use a temporary buffer, as the previous one will not have been used yet. What will happen when using UpdateSubResource() in this case?
    3. How is it advantageous to use one over the other? Is it purely how much data is being moved around? As far as I can tell, the same amount of memory is being copied around no matter which method you use - but I suspect I'm missing something here.
Nvidia recently recommended using Map + DISCARD for updating constant buffers, and said that their driver is capable of handling a very large number of small constant buffers. In my experience this seems to be good advice...I've used this approach for updating lots of constant buffers and it seems to work just fine. If you try to UpdateSubResource on a resource that is in use, then I believe the data will end up getting copied into the command buffer so that it can be later copied into the resource asynchronously. This is obviously not a good thing.

The presentation I mentioned is here, if you're interested (it's the first one, about managing buffers).
Advertisement
I'd generally use Map + discard because when updating a cbuffer you'll be throwing away the entire contents of the old copy; Map + discard is more likely to be optimized by drivers for that kind of usage pattern.

I have experimented some with using a pool of cbuffers but have found that changing buffers incurs more overhead than just discarding the previous contents.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Ah, awesome, thanks guys. That's good to know about UpdateSubResource() copying the data into the command buffer. You say it's not a good thing because of the extra copying that would be required, right?

So take this example: I have 1000 identical objects that all need unique values in their constant buffer. If I understand correctly, it would be just fine for me to create just one constant buffer for all of them to share, and use map/discard to update the constant buffer before each draw call. Is there any advantage to creating 1000 constant buffers and each object getting a unique one?

Is there any advantage to creating 1000 constant buffers and each object getting a unique one?


That's something you really need to profile - depending on your cbuffer sizes it may or may not be better. I've tested this exact kind of setup myself with 400 objects and 400 cbuffers each containing just a matrix and some lighting info (maybe 12 extra floats) and it was substantially slower - in my case because the overhead of changing cbuffers for each object was much higher than the overhead of just using map/discard on a single cbuffer. Your case may be different.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Makes sense alright. Thanks again.

This topic is closed to new replies.

Advertisement