Search code examples
windowswinapibufferdiskhard-drive

Why are read operations much faster after writing files using OS and disk buffers?


I am writing about a hundred files with a size of 50MB each sequentially to a directory on my disk using CreateFile() and WriteFile(). In a second steps, the contents of those files are read using CreateFile() and ReadFile().

I noticed some partially weird things:
If I pass FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH when writing the files, reading takes a noticably long time (usually hundreds of milliseconds). However, when I do not pass those flags (but use FlushFileBuffers() instead), writing appears to happen at roughly the same speed but reading those files after writing them is blazingly fast (less than 20 milliseconds per file!).

How is this possible? How do the flags passed when writing 5000MB of data affect reading later? Does the disk cache the whole 5GB in its cache?


Solution

  • When you pass FILE_FLAG_NO_BUFFERING then you are telling the system not to put the data in its disk cache. Then when you read the data, the system has to get the data from the disk.

    When you omit FILE_FLAG_NO_BUFFERING, the system can put the data in its disk cache. And so when you read the data subsequently, it can be read directly from memory, which is faster than disk.

    From https://support.microsoft.com/en-us/kb/99794:

    The FILE_FLAG_WRITE_THROUGH flag for CreateFile() causes any writes made to that handle to be written directly to the file without being buffered. The data is cached (stored in the disk cache); however, it is still written directly to the file. This method allows a read operation on that data to satisfy the read request from cached data (if it's still there), rather than having to do a file read to get the data. The write call doesn't return until the data is written to the file. This applies to remote writes as well--the network redirector passes the FILE_FLAG_WRITE_THROUGH flag to the server so that the server knows not to satisfy the write request until the data is written to the file.

    The FILE_FLAG_NO_BUFFERING takes this concept one step further and eliminates all read-ahead file buffering and disk caching as well, so that all reads are guaranteed to come from the file and not from any system buffer or disk cache.

    You might find this article from Raymond Chen of interest: We’re currently using FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH, but we would like our WriteFile to go even faster. An excerpt:

    A customer said that their program’s I/O pattern is to open a file and then every so often write about 100KB of data into the file. They are currently using the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags to open a file, and they wanted to know what else they could do to make their writes go even faster.

    Um, for one thing, you stop passing those two flags!

    Those two flags in combination basically mean “Give me the slowest possible I/O performance!” because they force all I/O to go through to the physical media right away.