Advertisement

Models, model matrices, and rendering

Started by September 26, 2017 06:44 PM
17 comments, last by Infinisearch 7 years, 4 months ago
14 hours ago, noodleBowl said:

And because I would want to draw a variety of the above, which may all have different transforms. I cannot put them into one buffer and save on draw calls correct (excluding instancing in this case)?

 

13 hours ago, Infinisearch said:

First of all if they are of the same vertex type you should be able to stick them in the same buffer thus reducing state changes between draw calls. (look at the arguments to a draw call to understand what I mean)

6 hours ago, noodleBowl said:

I'm actually not really sure what you are talking about here? The Draw call only has a vertex count and startVertexLocation. Am I looking at the wrong function? The only thing I can think of is the D3D11_INPUT_ELEMENT_DESC needed for a input layout

I think I might have read you wrong in the first quote and added the statement in parenthesis after, I don't really remember what I was thinking when I wrote that.  Ignore it for now... if I remember my line of thought I will post it.

 

7 hours ago, noodleBowl said:

More specifically I don't understand why they have changed the InputSlot to 1. Is this because the are binding 2 buffers and using 1 would point to the second buffer (m_instanceBuffer) where the instance modifications are stored? OR is it really just that they are reusing a semantic (TEXCOORD) and the two bound buffers (m_vertexBuffer and m_instanceBuffer) are treated as one big buffer?

This is vertex streams... you should look into them not just for instancing.  Basically lets say you have three vertex components, position, normal, and texturecoordinate.  You can stick that data into one struct, two structs, or three structs (when I say structs, I mean arrays of structs).  If you use multiple arrays you need a way to bind all the arrays, this is what input slots are for.  But for instancing the reason you use multiple slots is that the step rate for fetching data from those buffers is different.  Per vertex vs. per instance.

 

7 hours ago, noodleBowl said:

In the tutorial they create a InstanceType struct to hold the modifications they want to do to the vertex positions. But in a case of using a transform (model) matrix to do vertex data modifications would it be done the same way instead of using a constant buffer? 

Yeah you don't need to use a constant buffer.  But there is also another way to implement instancing using the system value SV_instanceid. ( I think thats it)  But you should learn that later.

-potential energy is easily made kinetic-

4 hours ago, JoeJ said:

Wouldn't it make sense to pretransform dynamic meshes too?

Thinking of skinning, tesselating, etc. multiple times for each shadow map, i assume pretransforming would be faster even if this means additional reads / writes to global memory. Drawing all models with one call is another advantage, GPU culling another, everything becomes less fragmented.

But i never tried that yet.

One thing i tried is to store an matrix index in vertex data (position.w), and load matrix per vertex. That worked surprisingly well, although on AMD it wastes registers. I did not notice a performance difference between drawing 2 million boxes with unique matrix per box or just using one global transform. Seems the rasterizer limited (boxes were just textured but not lit).

I was speaking about the context of reducing draw calls for static data, as in not data with per frame changes.  But if there is per frame changes you're right there might be gains to be had by pretransforming skinned or tessellated meshes.  But pretransforming per frame on the gpu will reduce calls depending on how you implement... stream-out or compute shader.  Thats interesting that you had no performance degradation with a matrix index per vertex.  But like you seem to imply the results might differ with more complicated shaders.

-potential energy is easily made kinetic-

Advertisement

BTW @noodleBowl have I been clear enough?  Is there anything you don't understand?

-potential energy is easily made kinetic-

On 7.10.2017 at 3:58 PM, Infinisearch said:

But pretransforming per frame on the gpu will reduce calls depending on how you implement... stream-out or compute shader.

I never considered pretransforming by vertex shader and stream-out. Is it possible to stream out to GPU memory with DX12/VK?

Actually i planned to do it with compute shader but somehow it feels wrong to reimplement tesselation on my own if there is already hardware for that. On the other side compute seems more flexible than hardware, e.g. if we want catmull clark subivision.

Also, having good compute but weak graphics experience i tend to think: 'geometry and tesselation stages are useless - use compute and pretransform instead.' But then why did AMD spend so much effort to improve those things for Vega?

Any thoughts about this should help me to get a better picture... :)

2 hours ago, JoeJ said:

Is it possible to stream out to GPU memory with DX12/VK?

I've never done it but I don't see why not.

2 hours ago, JoeJ said:

Actually i planned to do it with compute shader but somehow it feels wrong to reimplement tesselation on my own if there is already hardware for that. On the other side compute seems more flexible than hardware, e.g. if we want catmull clark subivision.

I've read like one or two papers where they implement tessellation using the compute shader, and I think they said flexibility was one of the benefits... don't remember much else.  I think the last presentation I posted above (optimizing graphics with compute) has a section on using compute on tessellation.

3 hours ago, JoeJ said:

Also, having good compute but weak graphics experience i tend to think: 'geometry and tesselation stages are useless - use compute and pretransform instead.' But then why did AMD spend so much effort to improve those things for Vega?

I don't have any tessellation experience and have kept away from it on purpose.  IIRC Vega hasn't really improved tessellation performance that much and Nvidia still kicks their butt in it. (at least with high tessellation factors)

3 hours ago, JoeJ said:

Any thoughts about this should help me to get a better picture...

Well like I said I have no experience with tessellation but again like I said earlier IIRC there were a few papers I read that seemed to implement it using compute.  The only thing I can definitively say is that implementing through the graphics pipeline with take up draw calls, doing it through compute will have a performance advantage over FF on hardware with lots of compute units but this advantage might be lost because instead of the expanded vertices's being stored in the cache they'd go through memory.

Maybe someone with more experience can chime in.

-potential energy is easily made kinetic-

I've worked on some games recently where we pre-transformed skinned meshes on the CPU. It's not ideal or a typical way to do things, but we did have spare CPU cycles available and were struggling for every GPU cycle we could find, so there was no reason for us to move that logic from the CPU to a compute shader.

5 hours ago, JoeJ said:

Is it possible to stream out to GPU memory with DX12/VK?

It's possible in DX10/GL, and DX12/VK haven't lost the ability ;)

5 hours ago, JoeJ said:

But then why did AMD spend so much effort to improve those things for Vega?

Because they're playing catch-up with NVidia :D Tessellation is used quite a bit by some games, and not at all for others. "Pass-through" geometry shaders (no geometry amplification) are useful for some things, and NVidia is really good at doing them with no typical GS penalty.
GS is used in a few modern tricks that might catch on soon -- e.g. NV encourages people to use the GS stage as part of a technique to perform variable resolution rendering for VR, where the edges of the viewport have less resolution than the center.

Advertisement
11 hours ago, Infinisearch said:

BTW @noodleBowl have I been clear enough?  Is there anything you don't understand?

There is a whole lot I don't understand haha, but that is because my graphics experience / knowledge is very fragmented. Just need more practice

On 10/7/2017 at 9:45 AM, Infinisearch said:

This is vertex streams... you should look into them not just for instancing.  Basically lets say you have three vertex components, position, normal, and texturecoordinate.  You can stick that data into one struct, two structs, or three structs (when I say structs, I mean arrays of structs).  If you use multiple arrays you need a way to bind all the arrays, this is what input slots are for

Not sure if you are talking about interleaved vs non-interleave buffers? Or if you are talking about streaming out data back to the CPU from the GPU to do further processing?

If you are talking about interleaved vs non-interleave buffers (I think this is what you mean or the option that make the most sense to me), why would I want to have non-interleave buffers?

On 10/7/2017 at 9:45 AM, Infinisearch said:

Yeah you don't need to use a constant buffer

Just a general question about constant buffers/buffers, might be stupid, but in that tutorial they created an extra vertex buffer to hold position modifications, so then for certain situations should I just use/bind an extra (non-constant) buffer?

For example the MVP matrix is not really constant and it can change every frame so would it be better suited in a buffer that does not use the D3D11_BIND_CONSTANT_BUFFER flag (even though you can set the usage to dynamic)? Where as something like a light's brightness, a value that wouldn't change, should go into a buffer that is created with the D3D11_BIND_CONSTANT_BUFFER flag?

Or is that all nonsense? That there are some optimizations going on behind the scenes or that it is just better to have them split up (coming from the viewpoint that there are probably way less constant buffer binds then binds that involve other buffer types like vertex data which would need to be rebinded per mesh)

7 hours ago, noodleBowl said:

Just a general question about constant buffers/buffers, might be stupid, but in that tutorial they created an extra vertex buffer to hold position modifications, so then for certain situations should I just use/bind an extra (non-constant) buffer?

For example the MVP matrix is not really constant and it can change every frame so would it be better suited in a buffer that does not use the D3D11_BIND_CONSTANT_BUFFER flag (even though you can set the usage to dynamic)? Where as something like a light's brightness, a value that wouldn't change, should go into a buffer that is created with the D3D11_BIND_CONSTANT_BUFFER flag?

Or is that all nonsense? That there are some optimizations going on behind the scenes or that it is just better to have them split up (coming from the viewpoint that there are probably way less constant buffer binds then binds that involve other buffer types like vertex data which would need to be rebinded per mesh)

Constant buffer are basically made for data that changes per frame, so you're fine if you use them.  In fact IHV's optimize constant buffer access if I'm remembering right.  As far as that tutorial goes its because its instancing they're using an extra vertex buffer, most probably because there is size constraints on constant buffers. (64KB if IIRC)

7 hours ago, noodleBowl said:

Not sure if you are talking about interleaved vs non-interleave buffers? Or if you are talking about streaming out data back to the CPU from the GPU to do further processing?

If you are talking about interleaved vs non-interleave buffers (I think this is what you mean or the option that make the most sense to me), why would I want to have non-interleave buffers?

I guess they are called interleaved buffers, that makes sense, but I am definitely NOT talking about streaming data back to the CPU.  However you should know the other names for it... vertex streams or structure of array layout (as in SOA vs AOS layout).  As far as why you'd want interleaved buffers lets start with what I said before with instancing.  With instancing some data is indexed per vertex other data is indexed per instance... to simplify fetching this data they are separated into separate buffers.  Also some hardware for performance reason likes SOA format (AMD IIRC) although using to many vertex streams can hinder performance.  Finally there is multiple passes and shaders and the wasting of space in a cache line which again is about performance.  For example lets say your engine uses shadow maps, so when you render your shadow maps the only data you need is position.  If your data is packed tightly into one buffer then when the GPU loads a cacheline most of it goes to waste.  A quick search of google shows this page:  https://anteru.net/blog/2016/02/14/3119/index.html which explains in more depth.

 

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement