Operating systems read from disk more than what a program actually requests, because a program is likely to need nearby information in the future. In my application, when I fetch an item from disk, I would like to show an interval of information around the element. There's a trade off between how much information I request and show, and speed. However, since the OS already reads more than what I requested, accessing these bytes already in memory is free. What API can I use to find out what's in the OS caches?
Alternatively, I could use memory mapped files. In that case, the problem reduces to finding out whether a page is swapped to disk or not. Can this be done in any common OS?
EDIT: Related paper http://www.azulsystems.com/events/mspc_2008/2008_MSPC.pdf
You can indeed use your second method, at least on Linux. mmap()
the file, then use the mincore()
function to determine which pages are resident. From the man page:
int mincore(void *addr, size_t length, unsigned char *vec);
mincore()
returns a vector that indicates whether pages of the calling process's virtual memory are resident in core (RAM), and so will not cause a disk access (page fault) if referenced. The kernel returns residency information about the pages starting at the addressaddr
, and continuing forlength
bytes.
There's of course a race condition here - mincore()
can tell you that a page is resident, but it might then be swapped out just before you access it. C'est la vie.