Search code examples
linuxunixaio

Linux async (io_submit) write v/s normal (buffered) write


Since writes are immediate anyway (copy to kernel buffer and return), what's the advantage of using io_submit for writes?

In fact, it (aio/io_submit) seems worse since you have to allocate the write buffers on the heap and can't use stack-based buffers.

My question is only about writes, not reads.

EDIT: I am talking about relatively small writes (few KB at most), not MB or GB, so buffer copy should not be a big problem.


Solution

  • Copying a buffer into the kernel is not necessarily instantaneous.

    First the kernel needs to find a free page. If there is none (which is fairly likely under heavy disk-write pressure), it has to decide to evict one. If it decides to evict a dirty page (instead of evicting your process for instance), it will have to actually write it before it can use that page.

    there's a related issue in linux when saturating writing to a slow drive, the page cache fills up with dirty pages backed by a slow drive. Whenever the kernel needs a page, for any reason, it takes a long time to acquire one and the whole system freezes as a result.

    The size of each individual write is less relevant than the write pressure of the system. If you have a million small writes already queued up, this may be the one that has to block.

    Regarding whether the allocation lives on the stack or the heap is also less relevant. If you want efficient allocation of blocks to write, you can use a dedicated pool allocator (from the heap) and not pay for the general purpose heap allocator.

    aio_write() gets around this by not copying the buffer into the kernel at all, it may even be DMAd straight out of your buffer (given the alignment requirements), which means you're likely to save a copy as well.