I need to read a sector from the physical disk, but without using the system cache.
I tried this:
import os
disk_path = "/dev/sdc"
try:
disk_fd = os.open(disk_path, os.O_RDONLY | os.O_DIRECT)
os.lseek(disk_fd, 12345 * 4096, os.SEEK_SET)
buffer = os.read(disk_fd, 4096)
finally:
if disk_fd: os.close(disk_fd)
But I get an error:
Traceback (most recent call last):
File "/home/marus/direct.py", line 8, in <module>
buffer = os.read(disk_fd, 4096)
OSError: [Errno 22] Invalid argument
In Windows I know that there are some alignment requirements for unbuffered file reading, but here in Linux I don't know how it is... What can be wrong here ? I executed the script as sudo
.
Edit:
If I remove the os.O_DIRECT
flag, everything works fine...
Update: I preallocated an aligned buffer like this:
buffer_address = ctypes.create_string_buffer(buffer_size + sector_size)
buffer_offset = (ctypes.addressof(buffer_address) + sector_size - 1) & ~(sector_size - 1)
buffer = ctypes.string_at(buffer_offset, buffer_size)
...but now how can I use this buffer with os.read()
?
man 2 open:
Since Linux 2.6.0, alignment to the logical block size of the un‐
derlying storage (typically 512 bytes) suffices
man 2 read:
EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT
flag, and either the address specified in buf, the value specified in count, or the file offset is not
suitably aligned.
E.g. both file position and buffer in memory should be aligned at 512 bytes. You can control file position with lseek, but read buffer in python requires different approach.
See https://bugs.python.org/issue5396 for the details and remedy.
Here's another discussion: Direct I/O in Python with O_DIRECT
Here what works for me:
import os
import mmap
disk_path = "/dev/sdc"
disk_fd=None
try:
disk_fd = os.open(disk_path, os.O_RDONLY | os.O_DIRECT)
os.lseek(disk_fd, 12345 * 4096, os.SEEK_SET)
f=os.fdopen(disk_fd, 'rb+', 0)
m=mmap.mmap(-1, 4096)
f.readinto(m)
print(m.read(4096))
finally:
if disk_fd: os.close(disk_fd)
the code above for me prints contents of that block.
This is how it works:
m=mmap.mmap(-1, 4096)
this allocates 4k of memory using mmap
. Returned memory will be aligned at memory page start, which again is usually 4k.
os.read
cannot read to a buffer. So here they are using the following trick - create a file object from the os file descriptor:
f=os.fdopen(disk_fd, 'rb+', 0)
the above creates f
which is similar to what you get from normal open
- it will have read/write/close.
Them they call
f.readinto(m)
which allows them to read from file object into a pre-allocated block of memory.
Now is the final step is to extract the data from mmaped memory block:
print(m.read(4096))