Why does `File::read_to_end` get slower the larger the buffer capacity?

NOTICE: As of 2023-04-23, a fix for this landed on rust-lang/rust:master. You'll soon be able to use File::read_to_end without these worries.

I was working on a pretty specific problem that required me to read hundreds of thousands of files ranging from a few bytes to a few hundred megabytes. As the bulk of the operation consisted in enumerating files and moving data from disk, I resorted to reusing Vec buffers for the file reading in hopes of avoiding some of that memory management.

That's when I hit the unexpected: file.read_to_end(&mut buffer)? gets progressively slower the larger the buffer's capacity is. It's a lot slower to read a 300MB file first followed by a thousand 1KB files than the other way around (as long as we don't truncate the buffer).

Confusingly, if I wrap the file in a Take or use read_exact(), no slowdown happens.

Does anyone know what that's about? Is it possible it (re)initializes the whole buffer every time it's called? Is this some Windows-specific quirk? What (Windows-based) profiling tools would you recommend when tackling something like this?

Here's a simple reproduction that demonstrates a huge (50x+ on this machine) performance difference between those methods, disk speeds disregarded:

use std::io::Read;
use std::fs::File;

// with a smaller buffer, there's basically no difference between the methods...
// const BUFFER_SIZE: usize = 2 * 1024;

// ...but the larger the Vec, the bigger the discrepancy.
// for simplicity's sake, let's assume this is a hard upper limit.
const BUFFER_SIZE: usize = 300 * 1024 * 1024;


fn naive() {
    let mut buffer = Vec::with_capacity(BUFFER_SIZE);

    for _ in 0..100 {
        let mut file = File::open("some_1kb_file.txt").expect("opening file");

        let metadata = file.metadata().expect("reading metadata");
        let len = metadata.len();
        assert!(len <= BUFFER_SIZE as u64);

        buffer.clear();
        file.read_to_end(&mut buffer).expect("reading file");

        // do "stuff" with buffer
        let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));

        println!("length: {len}, check: {check}");
    }
}

fn take() {
    let mut buffer = Vec::with_capacity(BUFFER_SIZE);

    for _ in 0..100 {
        let file = File::open("some_1kb_file.txt").expect("opening file");

        let metadata = file.metadata().expect("reading metadata");
        let len = metadata.len();
        assert!(len <= BUFFER_SIZE as u64);

        buffer.clear();
        file.take(len).read_to_end(&mut buffer).expect("reading file");

        // this also behaves like the straight `read_to_end` with a significant slowdown:
        // file.take(BUFFER_SIZE as u64).read_to_end(&mut buffer).expect("reading file");

        // do "stuff" with buffer
        let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));

        println!("length: {len}, check: {check}");
    }
}

fn exact() {
    let mut buffer = vec![0u8; BUFFER_SIZE];

    for _ in 0..100 {
        let mut file = File::open("some_1kb_file.txt").expect("opening file");

        let metadata = file.metadata().expect("reading metadata");
        let len = metadata.len() as usize;
        assert!(len <= BUFFER_SIZE);

        // SAFETY: initialized by `vec!` and within capacity by `assert!`
        unsafe { buffer.set_len(len); }
        file.read_exact(&mut buffer[0..len]).expect("reading file");

        // do "stuff" with buffer
        let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));

        println!("length: {len}, check: {check}");
    }
}

fn main() {
    let args: Vec<String> = std::env::args().collect();

    if args.len() < 2 {
        println!("usage: {} <method>", args[0]);
        return;
    }

    match args[1].as_str() {
        "naive" => naive(),
        "take" => take(),
        "exact" => exact(),
        _ => println!("Unknown method: {}", args[1]),
    }
}

Tried in a few combinations of --release mode, LTO and even +crt-static to no significant difference.

Solution

I tried using take with progressively higher numbers:

// Run with different values of `take` from 10_000_000 to 300_000_000
file.take(take)
    .read_to_end(&mut buffer)
    .expect("reading file");

And the runtime scaled with it almost exactly linearly.

Using cargo flamegraph gives a clear picture: NtReadFile takes 95% of the time.

It only takes 10% in the exact version. In other words, your rust code is not at fault.

The Windows docs don't suggest anything with respect to the length of the buffer, but from reading the rust standard library, it does appear that NtReadFile is given the entire spare capacity of the Vec, and it's apparent from the benchmark that NtReadFile is doing something on every byte in the buffer.

I believe the exact method would be best here. std::fs::read also queries the length of the file before reading, although it always has a buffer of the right size since it creates the Vec. It also still uses read_to_end so that it returns a more correct file even if the length changed in between. If you want to reuse the Vec, you would need to do this some other way.

Make sure that whatever you choose is faster than recreating the Vec every time, which I tried a bit and got nearly the same performance as exact. There's performance benefits to freeing unused memory, so whether it makes your program faster will depend on the situation.

You could also separate the code paths for short and long files.

Finally, ensure you need the entire file. If you can do your processing with chunks of BufReader at a time, with fill_buf and consume, you can avoid this problem entirely.