Search code examples
pythoncasynchronousioposix

The difference between asynchronous I/O in concurrent.futures and POSIX Linux


When speaking about asynchronous I/O, I want to understand the difference between POSIX interface used in Linux and concurrent.futures interface used in Python. I use the former one when I want to achieve asynchronous I/O in C code and the latter one in python code. I understand that concurrent.futures in python is a thread-based technique that attaches a callback to a thread so that it can be polled later for its status. However, I don't know how POSIX works! Is it also thread based as well?

Thank you


Solution

  • concurrent.futures is not specifically thread based (there are thread and process based executors available), nor is it specifically about async I/O; it's general parallelism. You could parallelize I/O with it, but it's the worker tasks that are async, with the I/O being a specific thing that can be parallelized.

    As it happens, for I/O, you would want to use the ThreadPoolExecutor; CPython's GIL isn't a problem for I/O bound tasks, and the IPC necessary to return results from a ProcessPoolExecutor's worker processes would largely eliminate the benefits of parallelizing the I/O. I just wanted to be clear that concurrent.futures is not purely about threads.

    POSIX AIO is, at least on Linux, just a user space library wrapping threads (roughly equivalent to using concurrent.futures.ThreadPoolExecutor to perform your I/O tasks), per the NOTES in the man page you linked:

    The current Linux POSIX AIO implementation is provided in user space by glibc. This has a number of limitations, most notably that maintaining multiple threads to perform I/O operations is expensive and scales poorly. Work has been in progress for some time on a kernel state-machine-based implementation of asynchronous I/O (see io_submit(2), io_setup(2), io_cancel(2), io_destroy(2), io_getevents(2)), but this implementation hasn't yet matured to the point where the POSIX AIO implementation can be completely reimplemented using the kernel system calls.

    Point is, in both cases, it's fundamentally about dispatching I/O requests in background threads with handles of some sort to allow polling and retrieval of results.

    Kernel supported async I/O could avoid or limit threading by any of the following:

    1. Internally managing the I/O request queues, perhaps merely dispatching them serially, working with the disk driver to order the requests such that the head seeks across them and pulls them efficiently
    2. Dispatching in parallel, and responding to device interrupts to signal completion
    3. Using a shared thread pool (similar to user space, but lower overhead since the whole OS can share the pool)

    but none of these techniques are actually used in Linux's implementation of POSIX AIO, and if any of them were used in Python via concurrent.futures, it would be a hand-rolled solution (since as mentioned, concurrent.futures performs arbitrary parallelism, it doesn't specifically support I/O).