Having been some time away from active graphics programming, thought I'd try my hand with a forward+ style renderer. I first got it working on D3D11, but in the interest of other platform compability I also ported it to OpenGL3.2. Not looking into D3D12 or Vulkan yet, just got it running with the “easy” API's first.
My rendering is fairly simple, first render all shadow maps into an atlas, then a depth only prepass and then the forward shading pass with a 3D cluster light lookup.
I'm testing a scene with a somewhat large amount (400) of unique meshes, which together with the Z-prepass and shadow maps ends up at roughly 2000 draw calls per frame. The passes are nothing out of the ordinary, just a loop of setting the world matrix, binding the vertex / index buffers, and drawing. Not even setting different materials for now.
On OpenGL I end up getting about 4x longer frame time (AMD GPU) or 2x longer (NVIDIA GPU) compared to D3D11. The clustered lighting shader doesn't seem to be a problem when compiled in GLSL, but the overhead per draw call just adds up much more rapidly on OpenGL. In my findings using ordinary uniforms instead of UBO's helps somewhat. Also tried using VAO's to capture the vertex & index buffer bindings; these help to reduce the draw call submission time on AMD, but then I just end up stalling for more time in SwapBuffers. On NVIDIA however, using VAO's increases the draw call submission time, though the frametime overall doesn't suffer much.
I'd like to hear if there are others with comparable experiences, or some hints to what I might be doing wrong, if anything. Or maybe it's just that GPU driver optimization for D3D11 during the years I've been away has been much more thorough?