Search code examples
asynchronousrustasync-awaitconcurrencyrust-tokio

Is there any point in async file IO?


Async runtimes for Rust like tokio provide "asyncified" copies of many standard functions, including some file IO ones, which work by basically just summoning the corresponding blocking task (on a new thread?). Examples of such functions are tokio::fs::create_dir_all, tokio::fs::read_dir, tokio::fs::read, ...

What's the advantage of all these functions? Why should I prefer using them over standard blocking functions in async context? If I'm .awaiting for their results, is there any gain at all?

An example would be an async web route that returns the contents of some file based on the query (using Rocket):

#[get("/foo/<query>")]
async fn foo(query: &str) -> Option<String> {
    let file_path = // interpret `query` somehow and find the target file
    tokio::fs::read_to_string(file_path).await.ok()
    // ^ Why not just `std::fs::read_to_string(file_path).ok()`?
}

I understand the gains of async/.await for socket IO or delayed tasks (with thread sleep), but in this case it seems pointless to me. But the opposite — this makes more complex tasks much more difficult to solve in code (working with streams when searching a file in a list of directories, for example).


Solution

  • I guess you're reading a small files on a local filesystem with a pretty fast drive. If that's the case, there may be little point in using the async version of these functions.

    If half of your HTTP requests need to read from the filesystem, then you might start noticing a substantial time where your runtime if waiting for blocking IO. This really depend on the nature of your application. Maybe you have one thread? Maybe you have many?

    However, there's edge-case scenarios where filesystem can be slow enough to be a really big problem. Here's two extreme corner cases:

    • A network mounted filesystem (e.g.: NFS, ipfs). There can be multiple network round-trips under create_dir_all. While that's blocking, your service is basically non-responsive.
    • Slow hard drive. Spinning disks. Or even reading from a CD-ROM drive. Sure, you won't run your web server from a CD-ROM, but a tool that compares whether two magnetic tapes (yes, physical tapes are still used for backups) are identical would suffer greatly if some underlying library is doing blocking IO.

    Now, if you're writing a library that exposes an async API, you can't make assumptions about the underlying filesystem or its backing hardware, and should use non-blocking IO.