How it gets cached is a bit different.
To begin with, as you likely know, they are a virtual memory area that is backed by the file system rather than backed by the normal paging system. In that regard, the process just treats it as virtual memory, and VM mapping works its magic.
With normal memory areas, when you read or write the memory, the normal virtual memory system will ensure the correct memory (the virtual memory mapping) is paged in, transferring data to and from the drive (from the paging files) if necessary to ensure the virtual memory access appears correct to the program.
With memory mapped file views, when you read or write the memory, exactly the same thing happens. The system ensures the correct memory (a mapped view of the file) is paged in, transferring data to and from the drive (from the data file) if necessary to ensure the virtual memory access appears correct to the program.
Notice the two are identical. The only difference is that virtual memory addresses are backed by a file.
The file system cache treats memory mapped files as potentially shared objects. Reads and writes are managed through the virtual memory system. When data needs to be loaded due to a VM miss, the OS loads it from the memory mapped file. When the OS wants to reclaim unused memory, the OS can purge blocks. When multiple processes share the memory, the view is shared with multiple processes. Critically, there is only a single copy of the data which is used.
In contrast, opening the files through CreateFile() or fopen() or similar use the Windows file cache manager. When you read or write to the file handle, a chunk of data is loaded from the disk to the system cache managed by the OS (e.g., 256KB block is reserved within the system cache, then 256KB is loaded from disk). The system cache may have many thousand blocks available and loaded from various files. Next, the system duplicates the data -- a second copy of the data -- and sends that over to your process's address space as data that was read. Often the data is parsed and stored -- a third copy of the data -- before the second copy is released, then the file is discarded.
The Windows file system cache is managed on a file object basis as a virtual block cache. When you close the file, the cache is released, and it cannot be reused for another process loading the file.
You might not be as familiar with those caches so I'll explain them too. There are two options, a logical block cache and a virtual block cache. A logical block cache works by mirroring what is on the physical disk; the disk's logical block address is loaded and kept around. Because a logical block cache represents what is physically on the disk at that location, it can be shared and reused. Window's virtual block cache is based on blocks of a file regardless of their location on the disk; the data is cached regardless of its location on the disk. Because the virtual block cache only represents the data, this allows other systems to modify the underlying details of a file, enabling features like shadow files. (This allows backup systems to open a shadow copy of the file that remains valid even when another program opens and writes to a file, the virtual blocks are the ones that matter, not the logical blocks as they exist on the physical disk.)
All of this means that if you open a file, close the file, then open the file again, you don't get any caching benefits from Windows file system cache manager. The benefit is to open a file and use random access to jump around, or when doing many small reads rather than large block reads.
Many hard drives have their own cache mechanisms, and they typically implement logical block caches. SSDs tend to have smaller caches, and NVMe disks are different as well.
So putting those all together...
Because you opened a memory mapped view of a file, traversed it once, then closed the memory mapped view, then opened it again and repeated the process, every time you should expect to see exactly the same performance. Nothing gets cached.
If you had opened one program with a memory mapped view of a file, left the file open, and then opened a second memory mapped view of the file, the second memory mapped view would be much faster, potentially near-instant because the virtual memory is already mapped into RAM.
When you closed the memory mapped view of the file, in addition to having the virtual memory saved out, the virtual blocks it was using are also discarded because file blocks don't survive when the file object is closed. The next time you opened it, new virtual blocks were used.
It is possible that other caches were involved, such as a cache on the disk drive itself, or caches by the disk drive controller, which stored the logical blocks from the disk for faster access. That would be independent of what Windows or your program were doing.
When you opened a copy of the file with fread or CreateFile(), blocks of the file were loaded into windows file system cache. As you read the file blocks were loaded probably as fast as they could be copied. Those triggered more memory allocations. Then you allocated new blocks of memory and made a second copy of the data for your process, which you used.
When you closed the file objects, the windows file system cache discarded the old virtual blocks because they were no longer in use. The next time you opened it, new virtual blocks were used.
Just like with the mapping, it is possible that other caches were involved on the disk drive or the drive's controller, but they were independent of Windows and your program.
Each of those memory allocations were secondary acts that could have had a performance imapct. Exactly how fast the memory allocations were isn't measured in your example. A profiler could tell you exactly how long they took. As you pointed out with your link, memory allocations have variable amounts of time depending on what the OS needs to do. Worst case it needs to reclaim virtual memory space and wipe the memory clean for security purposes before it can assign it to your process and then let your memory allocator subdivide it for use. Best case it already has some memory clean and allocated to your process so the only work is in your own memory allocator. Frequently it must search the global allocation tables for an already-zeroed large block, assign the block from they system open pool to your process, then perform the allocation from the block within your process.