First question: why you need two different types of structs to hold pretty similar data? Both cases require vertex array id, program id, primitive type, buffer ids etc. This means you could just have one "render item" struct that holds generic data about every mesh you want to draw. This allows further to store them in an efficient way and sort based on some criteria that would minimize the number of draw calls required. Remember - on this level it's all mesh drawn with some set of resources.
If you need specific treatment for different meshes (like split drawing 2D/UI and regular 3D geometry, or have transparent geometry processed differently) you can create "lists" or "queues" which hold these uniform render items and can be processed on their own.
For more read about these topics just use search on these forms for mentioned keywords ("render queue", "render items/atoms/tasks") and you will get plenty of very interesting topics where a lot of clever people described how they do this kind of stuff. Main and most known article about this, which is mostly focused on sorting these items is: http://realtimecollisiondetection.net/blog/?p=86
I think the main question you have to answer is what kind of granularity and how many levels of abstraction of render chunk you need - in initial post you mentioned Nodes but also DrawRangeElementsBaseVertex struct which seems like something that could be merged together and have it created by higher level rendering layer (the one that knows which meshes to draw per game object). So instead of having these two layers just combine them into one that has all the info - base per-mesh uniforms (matrices, material properties), textures, vertex/index buffers, primitive types etc.
I can quickly describe what I have and what works fine for me, though by no means treat it as some kind of "best solution" - far from it.
I have 3 layers -
1. HIGH LEVEL INTERFACE which is directly accessed by my game logic, which contains container classes like StaticMesh, AnimatedMesh, BillboardMesh etc. These objects hold information such as Resource<Mesh> (which holds buffer ids etc.), Transform, visibility (drawn or not), and some specific logic like transforming mesh into another, having shape keys (morph targets) etc. There are two main roles for this layer:
a) It interfaces with game logic so I can affect the state of these objects from within my scripts, enabling, disabling, changing parameters as I need it.
(scripting)
StaticMesh@ baseMesh("modelfile", transform);
baseMesh.effect = "outline";
baseMesh.enabled = false;
BillboardMesh@ tree("tree_sprite", transform);
b) From Renderer perspective, these objects belong to rendering system and are iterated each frame to construct lower level render tasks. Basically these objects know what to do to produce a 1..N render tasks that are required to "draw" them.
(C++, render_system.cpp)
for (auto& mesh: staticMeshes_)
{
mesh.render(renderQueue);
}
(C++, static_mesh.cpp)
void render(RenderQueue& queue)
{
for (auto& mesh: model->getSubmeshes())
{
RenderTask task;
task.vertexBuffer = mesh.vertexBufferId;
task.indexBuffer = ...
task.primitiveType = TRIANGLES;
# and so on...
queue.submit(SCENE_OPAQUE_GEOM, task);
}
}
2. MID LEVEL - render queues and render tasks. So each frame renderer iterates existing objects (StaticMesh, AnimatedMesh, BillboardMesh) and collects RenderTasks from them. Each RenderTasks contains all the information needed to render some mesh and each object can create as many as it needs to "render itself". One important thing to note is that these tasks have nothing to do with draw calls YET. These tasks are then added to various queues - opaque scene geonetry, alpha tested geometry, skybox, 2D UI, shadow map etc - these lists are later processed by different "passes".
3. LOW LEVEL (OpenGL calls) - once I construct queues and they're filled by render tasks, I begin various processing, and this heavily depends on passes that I have (which are another topic). Basically each pass may request one or more queues to "yield" their tasks (this does not result in clearing queue, so various passes can iterate over same sets of tasks) in a sorted way according to some sort key and algorithm. Based on this range of items, I perform batching which is done per-pass as different passes may batch differently. This means my sorted render tasks are processed one by one and I construct final draw call structs. This allows me to do things like - for 100 submitted render tasks, if they're all using same shader and buffers, I can stuff it all into a single indirect draw call. So the result of third layer is to produce draw calls - indirect calls, instanced calls, single calls - all based on criteria of sorting and batching.
Also on this level I not only have draw call commands but also state changes needed - setting current program or vertex buffers. So it's an ordered stream of commands which are then pushed onto GPU one after another. Worth noting is - it's not the top level interface that decides there should be SET_PROGRAM command, but the mid tier, and only after processing things through the batcher. This is needed because top level does not have information needed to properly decide on state changes and would cause a lot of unnecessary state changes, so I derive these commands only from the batcher which knows when program needs to change to another one. Basically state changes equal batch break in this case, so I have usually as many batches as state changes (buffers + programs + render states) combined.
Output of stage 3 is simple array of commands, that is then executed by GL context. It works pretty well so far and allows me to batch things like Sponza scene into just few indirect calls more or less automatically. I just submit all submeshes and let the system handle it. Let me know if you have any questions.