Windows memory mapped file twice as slow as fread() when cached

Started by
39 comments, last by Puffin 4 years, 4 months ago
true
Kylotan said:
a light breeze said:

So, what do you think would be a good use-case for memory-mapped files that cannot be accomplished with traditional file i/o? I can think of only one, and it's such a niche case that it might as well not exist: passing the contents of a file to an API call, where the file is too big too fit into physical memory + swap space.

I'd say it's the other way around - what's a good use case for having to open a file, check its length, choose how much memory to allocate, read bit by bit, etc - vs. simply mapping the file and letting the OS manage mapping the relevant parts into memory? It seems like this is a perfect situation where the OS can usually do a better job than you in most cases, leaving the file I/O API for very specific optimisations.

That line of reasoning works may work well on most operating systems, but on Windows you will suffer a significant performance penalty - hence this thread. Just because the OS can do a better job doesn't mean it will.

That leaves convenience as a reason for choosing memory-mapped files despite the performance penalty, but for small files the convenience is in the form of not having to write a single five-line reusable function that you just need to write once. And for large (multi-gigabyte) files, the performance penalty becomes too big to ignore.

Advertisement

I'm just answering your opening question - it makes more sense to use memory-mapped I/O in pretty much every case unless you know there's an exception - and that's the reason we have this thread. There's no good or obvious reason why a mapped file should be slower, but plenty of situations where it could be expected to be faster, and in most cases it is also simpler, so it's natural that - if the API is good enough - you would reach for that first, meaning the real question becomes "what is a good use case for manual file I/O that can't be accomplished with a memory mapped file?"

I can't imagine a 5-line general purpose file handler that isn't going to churn through and fragment your process memory in many cases where a mapped file would not.

The reason for opening this thread was that there is a surprisingly high page-mapping overhead with memory mapped files on Windows (like 100 times more than one could reasonably expect, making it 10 times slower than just reading memory), and I was wondering if there was a workaround. But it seems there isn't.

So fread() is generally faster when you can load to existing buffers, but one thing to note is that if memory needs to be allocated for the file, then the memory allocation has the same page-mapping overhead (if the allocation is large or the heap is being grown) and the fread() method becomes slower, because fread() has some overhead too. The fread() is faster if you are just streaming the data to GPU by reusing buffers, loading another game level to existing buffers, running a tool that processes multiple files while reusing buffers, or if the data is compressed/encoded in such a way that it needs to be decoded somewhere else anyway. The performance difference is biggest if the file is already cached in memory (2x) or if the disk is fast (30% on my SSD).

Memory mapped files might conserve memory if you don't have to make a copy of the file in the application, which reduces the chance of files being evicted from the file system cache.

The interesting thing about loading "to existing buffers" is that you usually then have to move the data somewhere else in order to reuse that buffer later or indeed to be able to use the data you loaded, meaning it's hard to make a good comparison of what the overall performance burden is like.

The use cases are basically what Puffin described.

Memory mapping takes a view of the file and puts it directly into your process address space. The costs are that it takes up address space and you lose control over how the pieces are loaded, and the data must be constantly available such as on a disk rather than using the more general data stream interfaces. The benefits are that you can use the memory directly with no copies involved, and that you can share that direct access with other processes as an IPC method.

Traditional file loading the OS reads blocks, and allows you to handle it as a data stream. The costs are that you have multiple copies of the data, and that you generally must parse, process, or iterate over your data instead of having it as one chunk. The benefits are that you can constrain it to smaller buffers, can reuse buffers, and can work with the same data stream interface for everything, including non-disk devices like printers, modems, and network data.

In games, memory mapped files can be great when you're able to heavily preprocess data into the final memory format, so they're completely ready for use at the time of load. Usually the cooking process is tremendous, potentially taking many hours to cook all of a game's data, but properly organized and processed with the game designed to use data directly, done correctly the time saved for players is dramatic, especially when you multiply it by millions of players loading thousands of times.

a light breeze said:

So, what do you think would be a good use-case for memory-mapped files that cannot be accomplished with traditional file i/o? I can think of only one, and it's such a niche case that it might as well not exist: passing the contents of a file to an API call, where the file is too big too fit into physical memory + swap space.

Let's say you have N concurrent processes, each of which needs to access M bytes of the same data from a file. N*M is orders of magnitude greater than the amount of physical memory available. This isn't a niche case either, a database for instance could meet this criteria, and I've worked on other applications that have as well. This is similar to the API call example you mentioned, except it's beyond the scope of a process to know or control.

Also, while skimming over the thread again, I was reminded that Puffin mentioned a similar 10x slowdown when accessing an allocated buffer for the first time. So that sounds to me like the overhead for Windows to commit the memory, rather than something specific to memory mapping (which also requires a commit). Given that you're required to allocate a buffer so you can fread() into it, that slowdown exists with both approaches and should be disregarded in the comparison. The comparable test cases then are "memory mapped file" and "fread() to allocated buffer" (from Puffin's earlier results), which are roughly the same. Memory mapping actually looks a wee bit faster.

If the file fits into physical memory, you would only need shared memory, not memory-mapped files. If not, then I agree that that's a fairly compelling case for memory-mapped files.

Also, while skimming over the thread again, I was reminded that Puffin mentioned a similar 10x slowdown when accessing an allocated buffer for the first time. So that sounds to me like the overhead for Windows to commit the memory, rather than something specific to memory mapping (which also requires a commit). Given that you're required to allocate a buffer so you can fread() into it, that slowdown exists with both approaches and should be disregarded in the comparison. The comparable test cases then are "memory mapped file" and "fread() to allocated buffer" (from Puffin's earlier results), which are roughly the same. Memory mapping actually looks a wee bit faster.

But if you can load to existing buffers, or use small buffers for streaming, then fread() is faster. I mentioned a few examples of such cases in my last post.

a light breeze said:
If the file fits into physical memory, you would only need shared memory, not memory-mapped files. If not, then I agree that that's a fairly compelling case for memory-mapped files.

On Windows you use them the same way; for shared memory, a size is provided instead of a file name and it's back by the page file. So whether you use one or the other comes down to other factors, such as whether or not the data already exists in a file, is generated at run-time, needs to be writable, should be persisted back to disk, etc.

Puffin said:
But if you can load to existing buffers, or use small buffers for streaming, then fread() is faster. I mentioned a few examples of such cases in my last post.

Sure, but the analogous case for memory-mapped files is opening the mapping just once for all the tests, the same way you only allocate an fread() buffer once. Since that test didn't exist, the ones I mentioned were the only ones with the same overhead incurred.

Sure, but the analogous case for memory-mapped files is opening the mapping just once for all the tests, the same way you only allocate an fread() buffer once. Since that test didn't exist, the ones I mentioned were the only ones with the same overhead incurred.

No it's not. The cases I mentioned were not about reading the same file multiple times, but reusing the buffer for different files or using a small buffer for streaming.

This topic is closed to new replies.

Advertisement