Here's the code I'm using:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <malloc.h>
int main (int argc, char* argv[]) {
int fd;
int alignment = 4096;
int bufsize = 4096 * 4096;
char* buf = (char*) memalign(alignment, bufsize);
int i, n, result=0;
const char* fname = "1GB.txt";
if ((fd = open(fname, O_RDONLY|O_DIRECT)) < 0) {
printf("%s: cannot open %s\n", fname);
exit(2);
}
while ( (n = read(fd,buf,bufsize)) > 0 )
for (i=0; i<n; ++i)
result += buf[i];
printf("Result: %d\n", result);
return 0;
}
Here's the command I'm running:
echo 1 > /proc/sys/vm/drop_caches
time ./a.out 1GB.txt
Without O_DIRECT and after flushing page cache it takes only 1.1 seconds, with O_DIRECT it takes 2.5 seconds.
I tried changing the alignment and bufsize. Increasing the bufsize to 4096 * 4096 * 4
reduced the running time to 1.79 seconds. Increasing bufsize to 4096 * 4096 * 64
reduced running time to 1.75 seconds. Reducing the alignment to 512 reduced the running time to 1.72 seconds. I don't know what else to try.
I don't understand why using O_DIRECT makes the code slower. Could it be due to the fact that I'm using disk encryption?
I'm on Debian 12 kernel 6.1.0-9-amd64
UPDATE: Follow up: Why O_DIRECT is slower than plain read() even with read-ahead?
I think Linus summarizes O_DIRECT
pretty well in this old mailing list thread, where someone was experiencing the same problem you are:
On Fri, 10 May 2002, Lincoln Dale wrote:
so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no O_DIRECT. anyone have any clues?
Yes.
O_DIRECT isn't doing any read-ahead.
For O_DIRECT to be a win, you need to make it asynchronous.
The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances [*].
It's simply not very pretty, and it doesn't perform very well either because of the bad interfaces (where synchronicity of read/write is part of it, but the inherent page-table-walking is another issue).
I bet you could get better performance more cleanly by splitting up the actual IO generation and the "user-space mapping" thing sanely.
So you're experiencing slower read operations because no read-ahead nor caching is being performed, which is the normal behavior without O_DIRECT
.
Unless you want to request reading a much larger size, if you do chunked reads, you can really only benefit from O_DIRECT
if you are implementing asynchronous operations, for example using io_uring
. Other interesting solutions are also suggested by Linus in the mailing list thread linked above.