How can we control the window in RSS when mapping a large file? Now let me explain what i mean. For example, we have a large file that exceeds RAM by several times, we do shared memory mmaping for several processes, if we access some object whose virtual address is located in this mapped memory and catch a page fault, then reading from disk, the sub-question is, will the opposite happen if we no longer use the given object? If this happens like an LRU, then what is the size of the LRU and how to control it? How is page cache involved in this case?
This is the RSS graph on testing instance(2 thread, 8 GB RAM) for 80 GB tar file. Where does this value of 3800 MB come from and stay stable when I run through the file after it has been mapped? How can I control it (or advise the kernel to control it)?
As long as you're not taking explicit action to lock the pages in memory, they should eventually be swapped back out automatically. The kernel basically uses a memory pressure heuristic to decide how much of physical memory to devote to swapped-in pages, and frequently rebalances as needed.
If you want to take a more active role in controlling this process, have a look at the madvise()
system call.
This allows you to tweak the paging algorithm for your mmap, with actions like:
MADV_FREE
(since Linux 4.5)
MADV_COLD
(since Linux 5.4)
MADV_SEQUENTIAL
MADV_WILLNEED
MADV_DONTNEED
Issuing an madvise(MADV_SEQUENTIAL)
after creating the mmap
might be sufficient to get acceptable behavior. If not, you could also intersperse some MADV_WILLNEED
/MADV_DONTNEED
access hints (and/or MADV_FREE
/MADV_COLD
) during the traversal as you pass groups of pages.