Search code examples
c#linux.net-coreclr.net-6.0

How is none blocking IO for regular files is implemented in .Net on Linux?


As far as I know all IO on regular files are always blocking in Linux (see here). However you can still do File.ReadBLAHAsync(...)/File.WriteBLAHAsync(...) or other file related stuff just fine.

Are these wrappers faking the async call just to keep them backward compatible or some how keep the sync context satisfied?


Solution

  • It's worth pointing that there are multiple contexts at play here.

    The Linux operating system

    From Non-Blocking descriptors:

    By default, read on any descriptor blocks if there’s no data available. The same applies to write or send. This applies to operations on most descriptors except disk files, since writes to disk never happen directly but via the kernel buffer cache as a proxy. The only time when writes to disk happen synchronously is when the O_SYNC flag was specified when opening the disk file.

    Any descriptor (pipes, FIFOs, sockets, terminals, pseudo-terminals, and some other types of devices) can be put in the nonblocking mode. When a descriptor is set in nonblocking mode, an I/O system call on that descriptor will return immediately, even if that request can’t be immediately completed (and will therefore result in the process being blocked otherwise). The return value can be either of the following:

    • an error: when the operation cannot be completed at all
    • a partial count: when the input or output operation can be partially completed
    • the entire result: when the I/O operation could be fully completed

    As explained above, the Non-Blocking descriptors will prevent pipes (or sockets, or...) from blocking continuously. They weren't meant to be used with disk files, however, because no matter if you want to read an entire file, or just a part of it, the data is there. It's not going to get there in the future, so you can start processing it right away.

    Quoting your linked post:

    Regular files are always readable and they are also always writeable. This is clearly stated in the relevant POSIX specifications. I cannot stress this enough. Putting a regular file in non-blocking has ABSOLUTELY no effects other than changing one bit in the file flags.

    Reading from a regular file might take a long time. For instance, if it is located on a busy disk, the I/O scheduler might take so much time that the user will notice that the application is frozen.

    Nevertheless, non-blocking mode will not fix it. It will simply not work. Checking a file for readability or writeability always succeeds immediately. If the system needs time to perform the I/O operation, it will put the task in non-interruptible sleep from the read or write system call. In other words, if you can assume that a file descriptor refers to a regular file, do not waste your time (or worse, other people's time) in implementing non-blocking I/O.

    The only safe way to read data from or write data to a regular file while not blocking a task... consists of not performing the operation, not in that particular task anyway. Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose name starts with aio_). Whether you like it or not, and even if you think multiple threads suck, there are no other options.

    The .NET runtime

    Implements the async/await pattern to unblock the main event loop while I/O is being performed. As mentioned above:

    Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose name starts with aio_). Whether you like it or not, and even if you think multiple threads suck, there are no other options.

    The .NET threadpool will spawn additional processes as needed (ref why is .NET spawning multiple processes on Linux). So, ideally, when the .NET File.ReadAsync(...) or File.WriteAsync(...) overloads are called, the current thread (from the threadpool) will initiate the I/O operation and will then give up control, freeing it to do other work. But before it does, a continuation is placed on the I/O operation. So when the I/O device signals the operation has finished, the threadpool scheduler knows the next free thread can pick up the continuation.

    To be sure, this is all about responsiveness. All code that requires the I/O to complete, will still have to wait. Although, it won't "block" the application.

    Back to OS

    The thread giving up control, which eventually leads to it being freed up, can be achieved on Windows:

    https://learn.microsoft.com/en-us/troubleshoot/windows/win32/asynchronous-disk-io-synchronous

    Asynchronous I/O hasn't been a part of Linux (for very long), the flow we have here is described at:

    https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#unix

    Unix-like systems don’t expose async file IO APIs (except of the new io_uring which we talk about later). Anytime user asks FileStream to perform async file IO operation, a synchronous IO operation is being scheduled to Thread Pool. Once it’s dequeued, the blocking operation is performed on a dedicated thread.

    Similar flow is suggested by Python's asyncio implementation:

    asyncio does not support asynchronous operations on the filesystem. Even if files are opened with O_NONBLOCK, read and write will block.

    ...

    The Linux kernel provides asynchronous operations on the filesystem (aio), but it requires a library and it doesn't scale with many concurrent operations. See aio.

    ...

    For now, the workaround is to use aiofiles that uses threads to handle files.

    Closing thoughts

    The concept behind Linux' Non-Blocking descriptor (and its polling mechanism) is not what makes async I/O tick on Windows.

    As mentioned by @Damien_The_Unbeliever there's a relatively new io_uring Linux kernel interface that allows continuation flow similar to the one on Windows. However, the following links confirm this is not yet implemented on .NET6: