My program works with large data sets that need to be stored in contiguous memory (several Gigabytes). Allocating memory using std::allocator
(i.e. malloc
or new
) causes system stalls as large portions of virtual memory are reserved and physical memory gets filled up.
Since the program will mostly only work on small portions at a time, my question is if using memory mapped files would provide an advantage (i.e. mmap
or the Windows equivalent.) That is creating a large sparse temporary file and mapping it to virtual memory. Or is there another technique that would change the system's pagination strategy such that less pages are loaded into physical memory at a time.
I'm trying to avoid building a streaming mechanism that loads portions of a file at a time and instead rely on the system's vm pagination.
Yes, mmap
has the potential to speed things up.
Things to consider:
malloc
and free
will use mmap
with MAP_ANON
anyway. So the difference in memory mapping a file is simply that you are getting the VMM to do the I/O for you.madvise
with mmap
to assist the VMM in paging well.open
and read
(plus, as erenon suggests, posix_fadvise
), your file is still held in buffers anyway (i.e. it's not immediately written out) unless you also use O_DIRECT
. So in both situations, you are relying on the kernel for I/O scheduling.