I am calling mmap() with MAP_SHARED and PROT_READ to access a file which is about 25 GB in size. I have noticed that advancing the returned pointer has no effect to %MEM in top for the application, but once I start dereferencing the pointer at different locations, memory wildly increases and caps at 55%. That value goes back down to 0.2% once munmap is called.
I don't know if I should trust that 55% value top reports. It doesn't seem like it is actually using 8 GB of the available 16. Should I be worried?
When you first map the file, all it does is reserve address space, it doesn't necessarily read anything from the file if you don't pass MAP_POPULATE
(the OS might do a little prefetch, it's not required to, and often doesn't until you begin reading/writing).
When you read from a given page of memory for the first time, this triggers a page fault. This "invalid page fault" most people think of when they hear the name, it's either:
The behavior you're seeing is likely due to the mapped file being too large to fit in memory alongside everything else that wants to stay resident, so:
munmap
, the accounting against your process is dropped. Many of the pages are likely still in the kernel cache, but unless they're remapped and accessed again soon, they're likely first on the chopping block to discard if something else requests memoryAnd as commenters noted, shared memory mapped file accounting gets weird; every process is "charged" for the memory, but they'll all report it as shared even if no other processes map it, so it's not practical to distinguish "shared because it's MAP_SHARED
and backed by kernel cache, but no one else has it mapped so it's effectively uniquely owned by this process" from "shared because N processes are mapping the same data, reporting shared_amount * N
usage cumulatively, but actually only consuming shared_amount
memory total (plus a trivial amount to maintain the per-process page tables for each mapping). There's no reason to be worried if the tallies don't line up.