Windows memory mapped file twice as slow as fread() when cached

Skyler York · 2019-12-13T23:27:28

a light breeze said:If the file fits into physical memory, you would only need shared memory, not memory-mapped files. If not, then I agree that that's a fairly compelling case for memory-mapped files.On Windows you use them the same way; for shared memory, a size is provided instead of a file name and it's back by the page file. So whether you use one or the other comes down to other factors, such as whether or not the data already exists in a file, is generated at run-time, needs to be writable, should be persisted back to disk, etc.Puffin said:But if you can load to existing buffers, or use small buffers for streaming, then fread() is faster. I mentioned a few examples of such cases in my last post.Sure, but the analogous case for memory-mapped files is opening the mapping just once for all the tests, the same way you only allocate an fread() buffer once. Since that test didn't exist, the ones I mentioned were the only ones with the same overhead incurred.

Shaarigan

1,471

December 06, 2019 07:52 AM

Puffin said:
Regarding multithreading, some sources said that the Windows page mapping stuff is single-threaded and the situation is even worse with multithreading, others said that this has been fixed in 2017 Creators Update, but I didn't verify any of this yet. This is anyway something one should check before using.

It is true that you can access memory mapped files from different threads once you obtained a pointer and also can request different virtual pages from different threads, if this would not work it would be afatal failure in the case of multithreaded applications. What you have to do is to manage the aquisition by your own, this is a fact you should not ignore and/or mention, especially for thos epeople reading this and don't have experience with threads.

If you try to have different threads to aquire the same page due to an API call, this will potentially cause undesired behavior so you should always take responsablity for this by your own.

Puffin said:
You mentioned that if I opened a second instance of the same file when it's already open, it should be fast, but I tested this earlier too, at that assumption is incorrect

But this is true as long as you share the memory mapped file on both instances of the process. If you don't share, Windows is creating two different memory mapped file instances with double amount of pages etc.

On Windows you have to create the mmf in a special way to allow it to be shared between processes and one process needs to await the completion of the other or else it still may happen that you access different mmfs.

Puffin said:
Regarding caching: I specifically meant that the claim that memory mapped files wouldn't stay in main memory at all after all instances of the file are closed, is incorrect

This is also not true because if you close all processes that hold a specific mmf, it will be removed from memory at all because no one uses it again. Why should Windows keep them cached, this is bad design in best case but a potential security issue at all.

Again agree with @Zipster here. Trust the people working with memory mapped files in the first place and start thinking by your own in second is very important not just in this topic but all topics. There is no progress if people stop asking questions against established opinions

Puffin

3

Author

December 06, 2019 11:03 AM

Guys, please take seriously my request about thinking more open-mindedly. You're trusting the thoughts that first pop up in your mind way too much as if they were the entire picture.

(On a more general note, the world would be a much better place if everyone studied a little psychology about e.g. the cognitive biases and how much stronger they are than people usually realize, so that people would know when to stop and think instead of trusting their intuitions).

Memory-mapped files aren't some drop-in replacement for your conventional file I/O that will magically make things perform better, or make your code simpler. There is a proper way to use them for which you have to specifically design. Any contrived test case that tries to compare them apples-to-apples to conventional I/O is incorrectly designed, and will lead to these misguided metrics and conclusions.

I would advise anyone reading this thread to do their own research and learn about how and when memory-mapped files are useful.

This thread was never meant to be a general comparison about memory mapped file I/O vs conventional I/O. You guys apparently assumed it was, and turned it into that. It was originally about the surprising performance penalty when reading cached memory mapped files (and only about that). I think this is quite clear in the conclusions section I wrote, and it specifically mentions the situations where the problem might be an actual problem.

I agree 100% it's better people test these things themselves rather than trust any of the information written in this thread. But people need to be careful to make the correct conclusions from the tests, because some people posting on this thread generally have not. E.g. don't assume that if reading is slower in one case than another then it must be reading from the disk - you must think about the numbers whether they make sense, e.g. in this case the slower read was still much too fast to be the disk. And don't assume it must be copying/zeroing memory - look at the numbers to note that it's too slow to be just that. Don't assume the on-disk cache might be caching it, if it's much too small.

It is true that you can access memory mapped files from different threads once you obtained a pointer and also can request different virtual pages from different threads, if this would not work it would be afatal failure in the case of multithreaded applications. What you have to do is to manage the aquisition by your own, this is a fact you should not ignore and/or mention, especially for thos epeople reading this and don't have experience with threads.

If you try to have different threads to aquire the same page due to an API call, this will potentially cause undesired behavior so you should always take responsablity for this by your own.

By "single-threaded", I wasn't referring to thread-safety of the API, I meant that (according to some sources) the code inside Windows that handles the page faults and the update of page tables is single-threaded and does not scale to multiple threads (which I haven't verified myself).

But this is true as long as you share the memory mapped file on both instances of the process. If you don't share, Windows is creating two different memory mapped file instances with double amount of pages etc.

No, that does not seem to be the case. I verified this again now and even if using the same CreateFile() and CreateFileMapping() handle, just opening two views with MapViewOfFile() will be equally slow (the first time each view is created that is, the second read is of course fast after the pages have already been mapped).

ptr1: 00000291E3D20000
read 1: 2.48832 GB/s
read 1: 24.1804 GB/s
read 1: 23.4896 GB/s
read 1: 24.1995 GB/s
read 1: 23.4662 GB/s
ptr2: 00000292D9EC0000
read 2: 2.64585 GB/s
read 2: 23.7152 GB/s
read 2: 22.0792 GB/s
read 2: 23.5864 GB/s
read 2: 22.0855 GB/s

This is also not true because if you close all processes that hold a specific mmf, it will be removed from memory at all because no one uses it again. Why should Windows keep them cached, this is bad design in best case but a potential security issue at all.

Windows uses unallocated memory of the system to cache the contents of recently-accessed files. The reason is that it makes future reads faster, just like any other cache. This is also very common knowledge, I'm surprised if you didn't know this. And how on earth is this cache "bad design"? It's not really considered a "security issue" either. Perhaps you're thinking about something else here, but I don't know what. Anyway my original point and evidence is clear.

Whether Windows caches files when reading them by memory mapping or just when using conventional I/O was another question, but like I said a couple of times already, I verified this and indeed it does. And of course it should, why would it be different than conventional file access.

But now, I've used a lot of time thinking and testing and verifying these things, and every time someone claims something, I've been thinking my head off whether there could be something there that I'm missing, and tried to verify everything, even though everything seems to point to the opposite. At the same time, you guys are just posting whatever pops up in your mind without testing anything, or bothering to think if your conclusions are even logical in the first place. Unless there is some seriously new information, preferably with backing test results or other credible verification, I must conclude that all the relevant information is still in the summary I wrote, and most of everything else is just unrelated or downright misinformation.

Puffin

3

Author

December 06, 2019 02:37 PM

Just realized that some people may have been assuming this discussion is about shared-memory (memory-only) files? No, it's about reading disk files with either memory mapped files or fread(), in which case there is a surprisingly high overhead with memory mapped files, on Windows. One would expect reading the memory mapped file when it's already in OS disk cache to be almost as fast as reading memory, but it's 10 times slower (and 2x slower than fread()). See details in the summary post earlier.

I also tested now the shared-memory case out of curiosity, and this applies there as well, as I would have expected. I.e. after opening a second view of the same shared-memory file that is already open, reading it the first time is 10x slower than reading from memory in general.

frob

46,226

December 07, 2019 01:19 PM

Just to be clear, the part I'm objecting to is "when it's already in OS disk cache". That is not how the cache works. That is the part that makes the assumptions wrong. The Windows cache manager doesn't work on a logical block basis, where you can completely close a file, reopen the file, and expect it to be in the cache. It works on a virtual block basis, where you open a file, jump around through the open file and it stays in the cache, then close the file and the cache is released.

Regarding shared memory, read carefully. You need to open a shared view of the file so there is only a single instance loaded, not a second instance. If you open a second instance then the new instance also incurs those costs.

Puffin

3

Author

December 08, 2019 11:01 AM

What I've been trying to say is that your information about how the cache works is wrong. It's easy to test and verify, please do.

It's also apparent in your posts that after you got the first impression that I would be a noob, you've been interpreting everything I say to fit that assumption. For example, when I say the file is already cached in memory, you think that I must be confusing it with another cache, because it cannot possibly be the case that I'm actually correct and there is a cache that you weren't aware of. Or if I'm talking about opening two views on the same file, you think I must have been actually creating two files, because I can't possibly be correct that accessing the same file with a second view is actually slow. Please consider this possibility now.

I'll try to reformulate one last time what I'm claiming (the second point below being the original point of this thread, the first one I thought was obvious to everyone):

Windows caches the data of recently-accessed disk files in otherwise-unused memory (edit: also after the file has been closed). Open a big file that you haven't yet opened since the last system restart, read it fully, close it. Repeat. Notice that the second time is much faster than the first (although slower than simply reading memory, but we'll return to this in the next point). Notice that the second time is faster than the disk speed (if your disk is not blazing-fast). Notice that e.g. Task Manager shows disk activity the first time but not the second time. Notice that this occurs with files or groups of files that are bigger than the on-disk cache, so it cannot be cached just there. Notice that if you read multiple files this way whose total size exceeds your free main memory size, it starts reading from the disk again, because the files no longer fit in the cache. The conclusion: Windows definitely caches the data of recently-accessed files in memory. This occurs regardless of whether you are reading the file using a memory mapped file or conventional I/O.
When you use a memory mapped file to read a disk file that has been read recently and is therefore still cached in main memory, it is 10x slower to read the mapped memory than it is to read a normal region main memory (but faster than reading from the disk). If you read from the same mapped view (same region of application memory space) again, it is as fast as reading from the main memory. If you open another view of the same file, then regardless of whether the first view is still open or not, reading from the second view for the first time is also 10x slower than reading from main memory. The same occurs with shared-memory-only files: if you open a second view on the same file instance, reading from the second view for the first time is 10x slower than reading from main memory. The same occurs after allocating a large memory block (bigger than 0.5-1.0 MB): reading/writing the newly-allocated block is 10x slower than reading/writing a block that has been accessed earlier. This is not because of CPU cache, but occurs also with blocks that are much bigger. It's not because the file would be read from the disk (as explained in the previous point). It's not because the memory is being zeroed/copied, because that wouldn't take 10x the time of normal reading/writing memory, and because it would defeat the sharing of a mapped file and make e.g. the hundreds-of-tool-instances case not work. The cause is almost certainly the Windows page mapping implementation being suboptimal (see e.g. the link posted earlier), but the exact cause is not the most important, the important thing is that the phenomenon is consistent.

See my earlier post with the big "summary / conclusion" title for some more details and a test that verifies this. Because the discussion here has not been objective or rational, I'll have to stop spending time on it. I'll just say to anyone interested that if people continue to come up with counter-arguments here, don't trust them, just test this yourself, it's very easy actually.

Zipster

2,420

December 10, 2019 12:12 AM

Allow me to be more direct - your test cases don't utilize memory-mapped files optimally or appropriately. There's nothing surprising about what you're seeing other than that it contradicts your initial expectations and assumptions about how it should perform in some contrived scenario. Since your code treats it as analogous to conventional file I/O in usage and behavior, the comparison between them is implicit even if you don't realize it. I'm not sure why you repeatedly insist that the problem must be some sub-optimal implementation in Windows, instead of your own lack of understanding of how and when file mapping should be used, and why. No one here is assuming you're a noob, but we do speak from a place of knowledge and experience, and you are being inexplicably obstinate.

At the end of the day, what's the point in hypothesizing about the internal behavior of the OS? Are you in a position to know for sure and/or do anything about it? Ultimately all you can do is pick the best solution for your problem based on the information you have (and validly-collected data). Memory mapped files are one of those niche features that unless you know for sure you need it and it will definitely help, you probably don't need it and it likely hurt you.

a light breeze

359

December 10, 2019 07:48 AM

So much wrong with the above...

Look, anything that can be implemented with memory-mapped files can also be implemented with traditional file i/o and vice versa. You can choose between them on the basis of performance or convenience, and when you choose on the basis of performance you must necessarily make a direct comparison between memory-mapped files and traditional file i/o. Just be sure to tune the test to match your actual access patterns.

Fact is, memory-mapped file performance is crap under Windows, both in comparison to other operating systems and in comparison to traditional file i/o. If your file is small, you are almost always better off loading it directly into memory in its entirety. If your file is too large to load up-front, you may be better off implementing your own file caching system or just accessing it indirectly by going through traditional (but cached) file i/o for each access. There are few if any cases where memory-mapped files are the best solution for performance under Windows. On other operating systems, you might be able to go all memory-mapped all the time with no performance penalty.

Zipster

2,420

December 10, 2019 08:39 AM

I understand that you've never encountered a situation that would truly benefit from memory-mapped files over traditional I/O, much less one that's physically impossible to accomplish without it. However they do exist, and if you spent some time thinking on the nature of memory virtualization and its limits (in any OS), it's really not hard to come up with a few cases.

a light breeze

359

December 11, 2019 09:01 AM

So, what do you think would be a good use-case for memory-mapped files that cannot be accomplished with traditional file i/o? I can think of only one, and it's such a niche case that it might as well not exist: passing the contents of a file to an API call, where the file is too big too fit into physical memory + swap space.

Kylotan

10,512

December 11, 2019 11:10 AM

a light breeze said:
So, what do you think would be a good use-case for memory-mapped files that cannot be accomplished with traditional file i/o? I can think of only one, and it's such a niche case that it might as well not exist: passing the contents of a file to an API call, where the file is too big too fit into physical memory + swap space.

I'd say it's the other way around - what's a good use case for having to open a file, check its length, choose how much memory to allocate, read bit by bit, etc - vs. simply mapping the file and letting the OS manage mapping the relevant parts into memory? It seems like this is a perfect situation where the OS can usually do a better job than you in most cases, leaving the file I/O API for very specific optimisations.

Windows memory mapped file twice as slow as fread() when cached

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Windows memory mapped file twice as slow as fread() when cached

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines