Search code examples
windowswinapietwwindows-kernel

How investigate disk cache usage in Win32 application?


I have a workload similar to the following:

while True:
    data = get_data_from_network();
    filename = sha1(data);
    write_to_file(filename, data, data.size());

Occasionally I read back from the file, but it's not very common. Importantly, I get a lot of these network requests. It's not uncommon for me to a gigabyte of data out to the disk this way. So for the most part I'm effectively just streaming large volumes of data to the disk. There is this article from Raymond Chen where he advises the customer not to use the flag, because as Raymond puts it:

If the application reads back from the file, the read can be satisfied from the disk cache, avoiding the physical I/O entirely

But I'm not sure if this applies to me, because depending on the size of the cache, there's a pretty good chance that by the time I go to read that data again, it's already been pushed out by some other data.

I can bypass this with FILE_FLAG_NO_BUFFERING when I call CreateFile(), but before I just go and blindly do this, I'm wondering how can I investigate the impact of this from a performance point of view. I can just time my application, sure, but I'd like to go deeper.

For starters, how big even is the OS cache? Is it per-process, per-file, global? Is the size configurable? Can I query its size programatically via an API? Is there a way for me to investigate if it's being thrashed due to my workload? Is there a way to run my program and then determine how many disk reads were served from the memory cache as opposed to from the physical media?


Solution

  • You could use Windows Performance Toolkit which is part of the Windows SDK to analyze ETW data. Recording is easy:

    wpr -start CPU -start DiskIO -start FileIO 
    Execute use case but record no longer than a few minutes because oldest data is overwritten in a Ring Buffer.
    wpr -stop c:\temp\IOTrace.etl
    

    Then you can analyze the data in WPA. For you the most important ones are File I/O and Disk Usage.

    enter image description here

    Disk is showing actual (uncached) hard disk accesses while File IO shows all file operations regardless if they were cached or not. If you flush the cache you would see later high disk IO due to reading data which could previously be cached. Windows caches all files which were read in the Standby list which is basically the free memory. If you allocate all physical memory then also no disk cache is there. You can see the size in Task Manager by looking at

    enter image description here

    the Memory tab and hover over the third region. To see the actual file system contents you can use Rammap from SysInternals which can show which files are stored in the Standby and other OS managed lists. enter image description here

    A more detailed explanation about the ETW view can be found at https://aloiskraus.wordpress.com/2016/10/09/how-buffered-io-can-ruin-performance/