Advertisement

3D Tilebased engine question (STL collection problem?)

Started by April 09, 2001 04:33 AM
5 comments, last by onnel 23 years, 10 months ago
I''ve been working on a 3d tiloe based engine (looking for sort of a 3D X-com look and feel). I have the basic tile rendering all up and running and that''s fine. I''m running into some speed issues, though. I''ve crossposted this also to the 3d graphics group, as the problem may be there. Basically, every tile is made up of 4 wall objects, a floor and a ceiling object. Any of these can, of course be NULL (meaning they don''t exist for the given tile). Previously, I simply looped through all my tiles and rendered whatever elements were present in a given tile (if there was a floor, I rendered it). Obviously, this begs for optimization, but it was a good starting point and worked fine when my wirld was 10x10x10 tiles (1000 tiles). The problem was, expanding to 100x100x10 slowed the engine down to about 3 FPS...for obvious reasons! For my test, the actual number of squares with elements was roughly 20,000 (so the other 80,000 squares were being checked for no reason). So I though I would optimize by creating a STL vector of only those elements that needed to be drawn. For example, the vector would contain not the squares, but pointers to the floor, the wall, etc. This means that instead of looping through 80,000 squares needlessly, I loop only through the 20,000 elements I know I want to draw. The problem is that, using this STL vector of pointers to my elements, performance is completely dead. The machine just seems to hang. Is 20000 elements too many for a STL vector? I''m on VC++ 5.0 and know that STL compile times aren''t supposed to be great, but is there an actual speed issue? I cannot imagine how looping through 20,000 elements can be slower than looping through 100,000. Would I be better off not using the STL vector and instead implementing the items as a linked list or something similar? I know that the push_back onto the vector will be slow, but this is only done once when loading all the elements. To actually access them, I use a for loop < vector.size and use at(count). Is there a problem with this? What I need is a fast method for mainly looping through them with the ability (not often used) to add or subtract items from the collection. Any ideas? Note: in my final version, maps probably won''t be that big (more like 50x50x5 or so), but I''d like to be able to tackle the bigger maps with the engine. All help appreciated! Best, Onnel
quote:
Original post by onnel
Basically, every tile is made up of 4 wall objects, a floor and a ceiling object. Any of these can, of course be NULL (meaning they don''t exist for the given tile). Previously, I simply looped through all my tiles and rendered whatever elements were present in a given tile (if there was a floor, I rendered it). Obviously, this begs for optimization, but it was a good starting point and worked fine when my wirld was 10x10x10 tiles (1000 tiles). The problem was, expanding to 100x100x10 slowed the engine down to about 3 FPS...for obvious reasons! For my test, the actual number of squares with elements was roughly 20,000 (so the other 80,000 squares were being checked for no reason).


I''m not quite sure why there''s the difference here. Is it that most of your tiles only have two or three objects that need to be rendered?
quote:

Is 20000 elements too many for a STL vector?


Not at all. A vector is just a wrapper for a dynamically allocated array in memory. If you can ''new'' 20000 objects, you can use a vector of that size. How large is each object? It sounds like it might be a vector of pointers, in which case each object would be 4 bytes (80k). This should be fine.
quote:

I''m on VC++ 5.0 and know that STL compile times aren''t supposed to be great, but is there an actual speed issue?


You can easily abuse STL. For vectors specifically, you should know that:
- it is massively wasteful to insert or remove anywhere but at the end of the vector
- every time you reach the bounds of the memory allocated to the vector, it needs to reallocate a new block of a larger size, copy all the elements over, and deallocate the old block.

The default behavior of a vector that''s having elements inserted by push_back is to double in size whenever it needs to; each time it doubles, it copies everything over. Therefore, if you have a loop that''s doing push_back, the vector size will be:
0, 1, 2, 4, 8, 16, 32, 64, 128,...
Each time it changes, it must copy all of its elements to the new memory and delete the old memory.

So, the short answer is that a vector CAN be slow, although looping through each element in a vector is exactly the same as looping through an array. In order to avoid excessive reallocations, I would suggest using vector::reserve to pre-allocate a large amount of space; you may still outgrow this space, but it cuts down on the initial reallocations.
quote:

Would I be better off not using the STL vector and instead implementing the items as a linked list or something similar?


Again it depends on what you''re doing. If you''re reading-in the data all at once, and only reading the data after that (i.e. you don''t change the container after you''ve read-in the data), you can''t get faster than a vector. This sounds like what you''re doing.
quote:

I know that the push_back onto the vector will be slow, but this is only done once when loading all the elements.


If you reserve first, push_back is very fast. However, each time you insert any element into any kind of container, STL needs to copy that element--there''s no way around this the way STL was designed. If you''re working on large objects, this could be substantial. My question would be, what''s your vector?:
vector&ltObject*> vec; // vector of pointers, very fast push_back
vector&ltObject> vec; // vector of objects, calls copy constructor of Object, could be very slow
quote:

To actually access them, I use a for loop < vector.size and use at(count). Is there a problem with this?


There are many ways to skin a cat. If I were you, I would do this:
for (vector&ltType>::iterator it = vec.begin (); it != vec.end (); ++it)  do_something_with (*it); // *it == at(count) 

These might compile to the exact same thing, but the loop I show is general for almost all STL containers.
quote:

What I need is a fast method for mainly looping through them with the ability (not often used) to add or subtract items from the collection. Any ideas?


Ah, now here''s a problem. If you''re adding/subtracting items, and you''re doing it from the middle of the container, you should be using a list. If you are only adding/subtracting from the beginning and end, you can get away with using a deque. Only if you''re adding/subtracting from the end exclusively should you use a vector.

My best advice would be to put some profiling code into your engine so you can see for yourself which part is taking all this time. If the computer "seems to freeze", nail down which specific functions are causing this. The clock () routine will help a lot with this.

Good luck.
Advertisement
Stoffel, first thanks for the awesomely detailed reply!

quote:

...For my test, the actual number of squares with elements was roughly 20,000 (so the other 80,000 squares were being checked for no reason).

I''m not quite sure why there''s the difference here. Is it that most of your tiles only have two or three objects that need to be rendered?



Because often many( up to 90% in extreme cases) of the squares may having NOTHING in them, so there''s is no reason to be looping through all of them in every rendering cycle. Much better to just hit those items that need rendering!

My approach may change once again once I use an OctTree or similar method for excluding areas outside of the camera''s view.

My vector is for pointers, so the memory shouldn''t be too horrible. I didn''t know about reserve, so that should REALLY help. Hopefully the slowdown I was seeing was all of the calls to push_back without having reserved. I''ll add reserve and see what that does.

Also, as far as switching from a vector to a list, I will only be adding at the end and will only rarely be removing (this will however be arbitrary objects, not guaranteed to be at the end). What is the downside of switching to a list from a vector? Is it a no brainer or do I lose something?

Thanks, too for the3 clock method...I''d never used it before.

Thanks again for the great response. Most of it was stuff I already knew, but some of it was new and it was great to get confirmation of the other stuff!

Best,
Onnel
quote:
Original post by onnel
Also, as far as switching from a vector to a list, I will only be adding at the end and will only rarely be removing (this will however be arbitrary objects, not guaranteed to be at the end). What is the downside of switching to a list from a vector? Is it a no brainer or do I lose something?


You lose some things. There''s no (efficient) random access to lists, so you can''t look at the Nth element in constant time like you can for a vector. You should only go forward and backward from a given element--hence lists use bidirectional iterators, whereas vectors use random iterators.

Also, there''s now two pointers for each element (next & previous pointers), so if your list type is pointer, it''s 200% larger in memory than the vector of pointers.

The benefit, of course, is that you can insert and remove elements in constant time, whereas doing so in a vector is a linear operation.
Thanks for the info. I think I will stick with my vector for now, as deletes will be very rare. I''ll probably use a separate data structure for objects that are more likely to change (like characters, effects, etc... as compared to terrain) so the different data structures can best fit the model of how the data will be used.

Also, I reserved the space ahead of time for my vector and I am now getting about 13 FPS with 20,000 squares (made up of two triangles) being displayed. This is on a p750 laptop with a 3d card (16 meg ATI card..surprisingly good!) and 256 megs memory.

The FPS is not resolution dependent, so it seems to be simply a matter of how many times I can call drawPrimitive in a given render call. The 20,000 calls is simply too many. I don''t really know about how many triangles d3d can push out, but I didn''t think 40,000 would be excessive on this machine.

Is there any way to optimize calls to d3d drawPrimitive or any tricks for not rendering every object on every rendering call? I could limit the user''s view so that they never see that many object (and use quad/oct trees to make sure the offscreen items aren''t rendered), but that''s not really what I envisioned for the game. I wanted a user to be able to zoom all the way out (LOD for meshes can''t really help here since I''m just rendering lots of squares made of 2 triangles apiece, but obviously can for other objects like the characters) or zoom in parallel to the map, both of which could cause a large number (20,000+) squares and therefore an even larger number of triangles to be displayed.

One huge speed optimization is if I could display larger areas as single large squares instead of individual squares. This would get really complicated, as the squares might have to be reconfigured on the fly as terrain deformed. If I have no othre options, I may pursue this, however. Hopefully the forum will have some better suggestions!

Thanks for all the input.

Onnel


Think about this: at 40,000 triangles/frame and 13 frames per second, you are drawing over 500,000 triangles a second. That is a lot for a non-T&L card. I wrote a simple benchmark on my computer that drew 5 pixel triangles with no textures, and I only got about 600,000 tris/sec (P3 700, G400, OpenGL). Consider that a Kyro II can hit 125 fps in Quake 3. With an aggressive estimate of 5 mil tris/sec, that is 40000 tris/frame. So you are hitting the tri count of Quake 3 (and actually I think the Q3 tri count is much lower 10-20k maybe).

Calling drawPrimitive for each triangle will really kill you. You should have a big array with all the triangles in it. Then use indices (not pointers) to access the triangle coordinates. I think these are called Vertex Buffers in Direct3D? Build a vector of the indices to visible tiles each frame and use reserve() to estimate the number. Then pass this entire vector to one D3D call.

Do all 20,000 tiles show up on screen at once? If not, you REALLY REALLY need to cull the tiles as best you can.

Also, does each tile have its own texture? If so, you should organize your drawing so that the number of times you change textures is minimal. If you are just using colors per vertex (instead of textures), this is no big deal.
Advertisement
Heh..when you put it in those terms, I feel better about my
engine alrady..mind you there is lots left to do. :-)

I''ve already gotten the tip to use VertexBuffers and that''s my next step. Unfortunately, there will be multiple textures, but as much as possible, I will render those items with the same texture at the same time. What''s the best approach to doing this whil still using a VertexBuffer? I cannot imagine having a different VertexBuffer for each group with different textures! Should I try and align items in the VertexBuffer so that those with the same texture are grouped together? This sounds like it could get messy!

I''d like to be able to show the whole map at once. I know that ideally, as you zoom out, you should use LODs. The problem is, with how I imagine my stuff looking, it''s not easy to see how LODs would work for the meshes (no problem for the textures). My setup is like X-COM, sort of like a tilebased game. I guess if I introduce the concept of compiling a map, I can create LODs when the map is compile...not a bad idea actually, though more work programattically!
Thanks!

Onnel

This topic is closed to new replies.

Advertisement