Search code examples
linuxioio-uring

Writes in io_uring do not advance the file offset


I'm writing some data to disk with io_uring and noticed that the file offset does not increase automatically after a write request is issued. Thus, if I issue two write requests through liburing, the second one will overwrite the first one since both are trying to write to the beginning of the file. Using these posix apis by themselves (write and writev) do not cause any issues, but using them through liburing never advances the file offset. The man page for liburing says setting offset to -1 will advance the file offset automatically, but that does not seem to be the case.

A small example is below. The expected behavior is that 4096 bytes of numbers from 0 to 1023 are written to the file, followed by 4096 bytes of 0s. However, the file only contains 4096 bytes of 0s. If I remove the line write_buffer_to_file(2), it now contains 4096 bytes of numbers, so it seems that the second call is overwriting the content of the first one. The fact that lseek always returns 0 confirms that the file offset never changes.

The code snippet is compiled with gcc and ran on RHEL 9.3 with kernel 5.14.

#include <cstring>
#include <iostream>
#include <liburing.h>
#include <unistd.h>

// a small macro to check for errors
#define SYSCALL(expr) if ((expr) < 0) { \
    perror("System call error");        \
}

const int WRITE_SIZE = 4096; // satisfy alignment requirement of O_DIRECT
int fd; // file descriptor
int *buffer; // write buffer
struct io_uring ring;

// write the content of the buffer to fd; the data argument sets user_data in the sqe, which shouldn't affect the result
void write_buffer_to_file(int data) {
    struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
    io_uring_sqe_set_data(sqe, (void*)(intptr_t)data);
    // according to the documentation, setting offset to -1 will advance the offset
    // neither 0 nor -1 work in my testing
    io_uring_prep_write(sqe, fd, buffer, WRITE_SIZE, -1);
    SYSCALL(io_uring_submit(&ring))
    std::cout << "Submitted " << sqe->user_data << std::endl;

    // now wait for it to complete
    struct io_uring_cqe *cqe;
    SYSCALL(io_uring_wait_cqe(&ring, &cqe));
    if (cqe->res < 0) {
        perror("cqe res less than 0");
        std::cerr << std::strerror(-cqe->res) << std::endl;
    }
    io_uring_cqe_seen(&ring, cqe);
    std::cout << "Reaped " << io_uring_cqe_get_data(cqe) << std::endl;
    // this line always prints 0 even though it's supposed to advanced 4096 bytes
    std::cout << "Current offset: " << lseek(fd, 0, SEEK_CUR) << std::endl;
}

int main() {
    // set up the file and the write buffer
    fd = open("test_file", O_CREAT | O_WRONLY | O_DIRECT, 0744);
    SYSCALL(fd);
    // O_DIRECT has stricter memory alignment requirements
    posix_memalign((void**)&buffer, 512, WRITE_SIZE);
    for (int i = 0; i < WRITE_SIZE / sizeof(int); i++) {
        buffer[i] = i;
    }
    
    io_uring_queue_init(5, &ring, 0);
    write_buffer_to_file(1);

    // set everything in the buffer to 0 and then write again
    memset(buffer, 0, WRITE_SIZE);
    write_buffer_to_file(2);

    io_uring_queue_exit(&ring);

    close(fd);
    return 0;
}

Solution

  • Turns out this is a problem with an old kernel version. Trying the same code on Ubuntu 22.04 (kernel 6.2) does not result in any problems.

    See this GitHub issue. It would be nice to identify the commit that fixed this bug, but the fix occurred a long time ago, making it difficult to find.