Direct3D 11 instance data not copying correctly

Started by
2 comments, last by Juliean 2 years, 3 months ago

I am making my first attempt at D3D's `DrawInstanced()` and I've reached a point where I'm genuinely stumped after a couple of days. My setup is comparatively simple: only intended for drawing zillions of quads into a space using an orthographic projection. Instead, the behavior I'm seeing is that it only draws the first quad of the batch.

First, here's the data structure that becomes the vertex shader's `cbuffer`:

   struct PerObjectData {
       glm::mat4 world_matrix;
       glm::vec4 color;
       uint32_t  texture_index;
       float     tiling_factor;

       char padding[8];
   };

And here's how it looks shader side:

cbuffer PerObjectData {
 float4x4 world_matrix;
 float4   color;
 uint     texture_index;
 float    tiling_factor;
};

During initialization, I set up what's necessary, including the above:

   //
   // Per Object Buffer
   //

   UINT perobject_buffer_size = sizeof(PerObjectData) * _batch.max_quads;
   D3D11_BUFFER_DESC perobject_buffer_desc { };
   perframe_buffer_desc.Usage          = D3D11_USAGE_DYNAMIC;
   perframe_buffer_desc.BindFlags      = D3D11_BIND_CONSTANT_BUFFER;
   perframe_buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
   perframe_buffer_desc.ByteWidth      = perobject_buffer_size;

   hr = device->CreateBuffer(&perframe_buffer_desc, 0, &perobject_buffer);
   if(FAILED(hr)) {
       PDR_ENGINE_ERROR("Failed to create world matrix buffer: ({}) {}",
                           hr, DX11Window::get_last_error_as_string());
       assert(false);
       return;
   }
   DX11Context::device_context()->
       VSSetConstantBuffers(1, 1, &perobject_buffer);

There's a per-frame buffer living in `StartSlot` 0, hence this one getting slot 1.

The user/game code is supposed to call `BeginScene()`, at which point the batching begins. The data looks good while it lives in system memory. Once there's a call to `EndScene()`, I "flush" the batch and call draw. For my testing, I've only got a batch of two.

   void Renderer2D::_flush() {
       HRESULT hr = S_OK;
       ID3D11DeviceContext *device_context = DX11Context::device_context();

       // Update per object data/constant buffer
       D3D11_MAPPED_SUBRESOURCE perobject_buffer_map;
       hr = device_context->Map(perobject_buffer, 0,
                               D3D11_MAP_WRITE_DISCARD, 0,
                               &perobject_buffer_map);
       if(FAILED(hr)) {
           PDR_ENGINE_ERROR("Could not map instance buffer: ({}) {}",
                           hr, DX11Window::get_last_error_as_string());
           assert(false);
           return;
       }

       size_t ob_size = sizeof(PerObjectData) * quad_count();
       memcpy(perobject_buffer_map.pData, _batch.instances, ob_size);
       device_context->Unmap(perobject_buffer, 0);

       _scene->texture_shader->bind();
       for(uint32_t tex = 0; tex < _batch.texture_count; tex++) {
           _batch.texture_slots[tex]->bind(tex);
       }

       UINT indices = sizeof(_indices) / sizeof(_indices[0]);
       device_context->DrawIndexedInstanced(indices, quad_count(), 0, 0, 0);

       _stats.draw_calls++;
   }

Now... what's weird is that if I do the copy like this:

   auto data = reinterpret_cast<PerObjectData *>(perobject_buffer_map.pData);
   for(uint32_t instance = 0; instance < quad_count(); instance++) {
       data->world_matrix  = _batch.instances[instance].world_matrix;
       data->color         = _batch.instances[instance].color;
       data->texture_index = _batch.instances[instance].texture_index;
       data->tiling_factor = _batch.instances[instance].tiling_factor;

       data++;
   }

It produces the exact same results. But if I omit the `data++` (which I did forget originally) it only draws the _second_ quad. With the increment included, both the `memcpy()` and `for` loop behave identically, only drawing the first.

I've checked with RenderDoc, and I am only getting one `PerObjectData` being copied over. The `struct` is 96 bytes, and the buffer on the GPU is 96,000 bytes (max of 1,000 quads for now). So did I manage to mark the rest of the buffer as read-only or something? `quad_count()` is returning the correct value, `ob_size` is being set to 192 bytes, so `memcpy()` should be fine. But even if it wasn't, the `for` loop should be so explicit as to be idiot proof. And yet... =)

Any pointers or hints would be most welcome. Thanks in advance!

None

Advertisement

hi,

i allways write the pad bytes explicite in the shader cbuffer definition

give it a try and report.

cbuffer PerObjectData {
 float4x4 world_matrix;
 float4   color;
 uint     texture_index;
 float    tiling_factor;
 float2   pad
};

pdmpcb said:
It produces the exact same results. But if I omit the `data++` (which I did forget originally) it only draws the _second_ quad. With the increment included, both the `memcpy()` and `for` loop behave identically, only drawing the first.

The bug is apparently that it only draws one instance. So if you don't do “data++” it writes every quad to the location of the first, meaning now that last one will be drawn.

You have omitted the part of code where you setup the binding of the instance-buffer. and the input-layout. You need to set the InstanceDataStepRat in the D3D11_INPUT_ELEMENT_DESC for the instance-buffer, otherwise the system will always use the first element in the buffer (and probably draw it twice in your case). But without seeing that code, no way to tell for sure.

This topic is closed to new replies.

Advertisement