Advertisement

Custom memory allocator for STL - dead end due to constructors?

Started by June 24, 2024 02:33 PM
36 comments, last by JoeJ 5 months, 2 weeks ago
Advertisement

You have trouble initializing an array of vectors sharing an instance of your custom allocator because there is no way to pass this allocator object through the array constructor to the “second level” constructor of the vectors.

This can be addressed, obviously, by replacing the raw arrays with a simple specialized container, a list of lists of T (maybe concretely derived from, or wrapping, std::array<std::vector<T>> or something similar) that has the constructors you need: receiving a fixed size (and an allocator if needed) for the first level array and an allocator for the second level initially empty vectors.

CustomAllocator<ExamplePayload> allo;
RaggedArray<ExamplePayload> e10(10,allo);//calls std::vector(allo) 10 times

Such constructors can hide from client code loops to construct vectors one by one, and unsightly qualifiers, std::forward, std::move etc.

Omae Wa Mou Shindeiru

LorenzoGatti said:
This can be addressed, obviously, by replacing the raw arrays with a simple specialized container, a list of lists of T

Um - yes. You're right. And it's obvious indeed. Why haven't i thought of that?
(Probably because the power of templates and other modern practices still aren't second nature to me ; )
I'll consider this, thanks!

I'm just done with the port to EASTL. 12MB of code files, but it's fully compatible with STL, and it seems everything is still working.

Doing a test with EASTL: 3 min. 36 sec.

Another, switching back to STL: 3 min. 19 sec.

Just one run is not enough for a benchmark, but seeing no surprises is good so far.
Now i can work on the memory management and see if it really helps me. It better does.
After that i should know better if i could pass allocators with constructors or not…

JoeJ said:
Doing a test with EASTL: 3 min. 36 sec. Another, switching back to STL: 3 min. 19 sec.

I have many vectors for temporary data used by parallel jobs

But i have thousands of objects, each having dozens of vectors. The objects are loaded from disk, processed and changed, saved back.

Though, i have millions of allocations up at a time, all quite small and just temporary. It's not really a wonder fragmentation grows to no end.

Are you sure you aren't simply doing unnecessary allocations of unnecessarily complex objects? Arrays of vectors of stuff should be long-lived and consolidated, and conversely creating short-lived objects shouldn't cause memory allocations. Adding an element to an empty std::vector causes a memory allocation.

For example, a 3D manifold where (conceptually!) each of thousands of polygons has a list of its vertices can be actually represented with three large long-lived std::vector instances, not thousands of small ones, and with sizes known in advance allowing one large reserve() call without reallocations unless you actually run out of space:

  • all vertices
  • all polygons
  • all half-edges

with polygons containing the index of the “first” of their half-edges and half-edges containing one or two vertex indices and two or three half-edge indexes.

In case of modifications it should be possible to recycle deleted polygons, half-edges and vertices by maintaining free lists, without fragmentation.

Omae Wa Mou Shindeiru

LorenzoGatti said:
Are you sure you aren't simply doing unnecessary allocations of unnecessarily complex objects? Arrays of vectors of stuff should be long-lived and consolidated

I could reduce the number of allocations by using a larger block size, e.g. like Minecraft divides the world into large chunks of voxels. It would require only one mesh data structure containing those 3 vectors you mention later.

But i better use small blocks, having many small meshes for the space of a single Minecraft chunk.
The advantage is, if the level designer edits the scene, e.g. moving a wall or door, there is less space to update than with using large blocks, so the waiting to play test the changes is a shorter duration.

My primary goal is to minimize waiting time on local changes. If you change a whole mountain it might take an hour, but if you only move a door it should take just 20 seconds, for example. And for this goal small blocks are better than large blocks, while for many other objectives it would be the opposite.

It's a balancing act to find the best compromise and settings, and it seems my ideal settings cause terrible memory fragmentation over time.
Processing the same scene at the same detail with larger blocks does not cause the slowdown. At least not that much. But to a small degree it does. If my scene would be bigger, and for an open world game it would be much bigger, the slowdown would become again unacceptable pretty quickly i guess.

If i quit my application and restart it, it's fast again. But i need to reload everything from disk, causing another wait.
After i quit my application, it takes minutes until all memory is freed, which is suspicious. I often kill the process to speed this up.

LorenzoGatti said:
In case of modifications it should be possible to recycle deleted polygons, half-edges and vertices by maintaining free lists, without fragmentation.

I will use freelists to manage my memory, but i can do so only after current work to implement it is done.
Currently the most mesh editing happens due to stitching the boundaries of adjacent mesh blocks. For that i just add the edits to the back of the vectors, and the deleted mesh elements remain disconnected in memory. That's ok because only few polygons are affected form stitching, if at all. Mostly the boundaries match.

But i need four iterations of the stitching process, and i need to store each intermediate result in memory and on disk, so future local edits have minimal workloads. So i have 4 copies of almost the same mesh around, which i could compress using a diff algorithm as used to compare two files of source code.

In fact, the final mesh is only 5% of my data or even less. But i need to keep all the intermediate results. If i would not, you would need to recompute all the data for the whole world after each subtle change, and making games with this would not be possible.

Sorry for the long text, but my stuff is very unusual, crazy, and it might fail. I'm aware of that. ; )

Advertisement

Just adding another possible option here. If you're using C++ 17, have you checked on Polymorphic Memory Resources (PMR)? Because I think it is literally meant for that when the Allocator is not enough.

I wanted to experiment on it but had no time before so I can't give examples nor detailed explanation. I just hope you can take a look and maybe it fits your problem rather than trying your best implementing the extremely limited Allocator interface for STL containers (good luck trying to naturally pass your custom memory allocator with that without hooking some dreaded globals, lol, it's gonna be hacky).

If it still doesn't suit you, EASTL or doing it yourself is what I'd agree in this talk.

---

Here are the first few paragraphs from the C++ 17 Complete Guide book for the overview of the feature's intention:

"Since C++98 the standard library has supported the ability to configure the way classes allocate their
internal (heap) memory. For that reason, almost all types in the standard library that allocate memory
have an allocator parameter. Thus, you can configure the way containers, strings, and other types allocate their internal memory if they need more space than the one allocated on the stack.

The default way to allocate this memory is to allocate it from the heap. But there are different
reasons to modify this default behavior:

  • You can use your own way of allocating memory to reduce the number of system calls.
  • You can ensure that allocated memory is located next to each other to benefit from CPU caching.
  • You can place containers and their elements in shared memory available for multiple processes.
  • You can even redirect these heap memory calls to use memory earlier allocated on the stack. Thus, there can be performance and functional reasons.

However, using allocators (right) until C++17 was in many ways both tricky and clumsy (due to some flaws, too much complexity, and modifications with backward compatibility.

C++ 17 now provides a pretty easy-to-use approach for predefined and user-defined ways of memory allocation, which can be used for standard types and user-defined types."

---

More info:​

  1. The book itself: C++ 17 Complete Guide, you can find it at: Part V Expert Utilities, Chapter 27 Polymorphic Memory Resources (PMR), page 305)
  2. Hopefully good enough article I found to give a rough idea how it works. From what I understand roughly, its first example allows vector to use a stack-based array of char (and how it expands I guess, I still don't get it yet) https://www.cppstories.com/2020/06/pmr-hacking.html/
  3. The library directly, I assume it's in https://en.cppreference.com/w/cpp/header/memory_resource

arly said:
Just adding another possible option here. If you're using C++ 17, have you checked on Polymorphic Memory Resources (PMR)? Because I think it is literally meant for that when the Allocator is not enough.

Looks very interesting and related. I did not know about it.

But first i need to finish what i have started. Adding allocators everywhere is a lot of work, and it's tense because i mostly need to edit dozens of files until i can compile and run again, to see if stuff still works.

But i can definitively do it the way i want using EASTL. I know it already works, but i can't tell yet if it helps. To figure this out, i need to port more code. Probably all of it.

(nervously working like a robot… : )

Almost done.
Doing a test shows it solves the problem. No more slowdown over time. \:D/

This topic is closed to new replies.

Advertisement