I'm allocating some memory using mmap()
with MAP_ANONYMOUS
flag. Because of the problem's features, often I need to write some data to one page ( for example, located in the middle ) of large memory chunk, and leave the other part untouched. So, I wouldn't this rather big part of unused memory to allocate physically.
In my view, such mmap()
call just give me a pointer to some 0-filled virtual pages, but due to to copy-on-write and demand paging mechanisms, no one page ( besides first maybe ) isn't actually allocated in RAM until first write attempt to it's memory.
The problem is: one moment I've got MAP_FAILED
from mmap()
( there were a lot of successful calls with fewer allocation queries before ), when tried to allocate a memory chunk larger than my physical RAM size. So, it seems there are much more pages actually allocated, even if there wasn't write access to them.
I need your help guys with two questions, first: Is my views to anonymous memory allocation correct, and (if not) , what are the inaccuracies?
And the second:
How could I measure number of actually allocated pages after anonymous mmap()
is done? I've tried use mincore()
, and it's results show me that almost all pages are "resident" in memory ( i.e., physically allocated? ). So, it seems mincore()
results are wrong, or I'm totally stuck :(
Upd.
It seems memory overcommiting mentioned by @Art indeed could influence this. But when I'm trying to disable it ( setting /proc/sys/vm/overcommit_memory
to 1
mode or using mmap()
with MAP_NORESERVE
flag, my machine is heavily freezing, and hard reset is the only thing that helps.
"It's complicated." (caveat: I have much more experience of VM systems from not Linux, but I've picked up a few details from Linux over the years)
Generally your assumption is true. Many operating systems don't actually do much on mmap
other than record "there is an allocation of X pages of object Y at V". Access to those pages causes a page fault which leads to an actual allocation. Some (or maybe all?) versions of Linux map a pre-zeroed read-only page to such allocations (not sure if it has changed or if it's still done) and the rest is handled like copy-on-write (for me it feels like a bad idea because it should generate more TLB flushes for a dubious optimization of reading zeroed memory which should happen rarely, but I guess Linux people have benchmarked it and found it good).
There are a few caveats. Just because you don't use the RAM doesn't mean that you'll be allowed to overcommit so much memory. Certain Linux distributions started default to a setting that disallows overcommit or disallows more than X% overcommit. Centos/RedHat was one of those (this was probably a decade ago, I don't know the state today). This means that even when you're just using 5% of your actual physical memory, if you have created anonymous mappings of 100+X% of your RAM, mmap will fail. There is a sysctl for it, look it up. It's something like /proc/sys/vm/overcommit_memory
or overcommit_ratio
or both, or probably even more under there.
Then you have to be aware of resource limits. Check with ulimit
if you're allowed to create mappings that big (ulimit -v), the problem could be as simple as that. Also (I'm not sure how Linux does it here, but you can create simple test programs to try it), resource limits for data size (ulimit -d) can be accounted as potential pages you can use rather than actual pages you currently use (in other words no overcommit in resource limits).
Then let's get back to faulting in only the pages you use. There has been research into detecting access patterns and predicting future faults (so that mmap:ed files could do read-ahead), but I don't know the state of it in Linux. It feels like it would be dumb to apply this to anonymous mappings, but you never know, maybe someone is trying to be clever. To make sure, use madvise(..., MADV_RANDOM)
on the mmap:ed block you get.
Finally mincore
. My experience from it on Linux is that it's garbage, last time I tried it it didn't work for anonymous mappings (or was it private?). In your case it could be as simple as it reporting that the read-only copy-on-write zeroed page for anonymous mappings is considered to be "in core".