Search code examples
pythonlinuxdisk-io

How can I do an unbuffered disk read in Python?


I need to read a sector from the physical disk, but without using the system cache.

I tried this:

import os

disk_path = "/dev/sdc"

try:
    disk_fd = os.open(disk_path, os.O_RDONLY | os.O_DIRECT)
    os.lseek(disk_fd, 12345 * 4096, os.SEEK_SET)
    buffer = os.read(disk_fd, 4096)

finally:
    if disk_fd: os.close(disk_fd)

But I get an error:

Traceback (most recent call last):
  File "/home/marus/direct.py", line 8, in <module>
    buffer = os.read(disk_fd, 4096)
OSError: [Errno 22] Invalid argument

In Windows I know that there are some alignment requirements for unbuffered file reading, but here in Linux I don't know how it is... What can be wrong here ? I executed the script as sudo.

Edit: If I remove the os.O_DIRECT flag, everything works fine...

Update: I preallocated an aligned buffer like this:

buffer_address = ctypes.create_string_buffer(buffer_size + sector_size)
buffer_offset = (ctypes.addressof(buffer_address) + sector_size - 1) & ~(sector_size - 1)
buffer = ctypes.string_at(buffer_offset, buffer_size)

...but now how can I use this buffer with os.read() ?


Solution

  • man 2 open:

       Since Linux 2.6.0, alignment to the logical block size of the  un‐
       derlying  storage  (typically  512 bytes) suffices
    

    man 2 read:

       EINVAL fd  is  attached  to  an object which is unsuitable for reading; or the file was opened with the O_DIRECT
              flag, and either the address specified in buf, the value specified in count, or the file  offset  is  not
              suitably aligned.
    

    E.g. both file position and buffer in memory should be aligned at 512 bytes. You can control file position with lseek, but read buffer in python requires different approach.

    See https://bugs.python.org/issue5396 for the details and remedy.

    Here's another discussion: Direct I/O in Python with O_DIRECT

    Here what works for me:

    import os
    import mmap
    
    disk_path = "/dev/sdc"
    
    disk_fd=None
    try:
        disk_fd = os.open(disk_path, os.O_RDONLY | os.O_DIRECT)
        os.lseek(disk_fd, 12345 * 4096, os.SEEK_SET)
        f=os.fdopen(disk_fd, 'rb+', 0)
        m=mmap.mmap(-1, 4096)
        f.readinto(m)
        print(m.read(4096))
    finally:
        if disk_fd: os.close(disk_fd)
    

    the code above for me prints contents of that block.

    This is how it works:

        m=mmap.mmap(-1, 4096)
    

    this allocates 4k of memory using mmap. Returned memory will be aligned at memory page start, which again is usually 4k.

    os.read cannot read to a buffer. So here they are using the following trick - create a file object from the os file descriptor:

        f=os.fdopen(disk_fd, 'rb+', 0)
    

    the above creates f which is similar to what you get from normal open - it will have read/write/close.

    Them they call

        f.readinto(m)
    

    which allows them to read from file object into a pre-allocated block of memory.

    Now is the final step is to extract the data from mmaped memory block:

        print(m.read(4096))