Search code examples
windowslinuxoperating-systemvirtual-memorypagefile

How can I get read-ahead bytes?


Operating systems read from disk more than what a program actually requests, because a program is likely to need nearby information in the future. In my application, when I fetch an item from disk, I would like to show an interval of information around the element. There's a trade off between how much information I request and show, and speed. However, since the OS already reads more than what I requested, accessing these bytes already in memory is free. What API can I use to find out what's in the OS caches?

Alternatively, I could use memory mapped files. In that case, the problem reduces to finding out whether a page is swapped to disk or not. Can this be done in any common OS?

EDIT: Related paper http://www.azulsystems.com/events/mspc_2008/2008_MSPC.pdf


Solution

  • You can indeed use your second method, at least on Linux. mmap() the file, then use the mincore() function to determine which pages are resident. From the man page:

    int mincore(void *addr, size_t length, unsigned char *vec);

    mincore() returns a vector that indicates whether pages of the calling process's virtual memory are resident in core (RAM), and so will not cause a disk access (page fault) if referenced. The kernel returns residency information about the pages starting at the address addr, and continuing for length bytes.

    There's of course a race condition here - mincore() can tell you that a page is resident, but it might then be swapped out just before you access it. C'est la vie.