Direct I/O is the most performant way to copy larger files, so I wanted to add that ability to a program.
Windows offers FILE_FLAG_WRITE_THROUGH
and FILE_FLAG_NO_BUFFERING
in the Win32's CreateFileA()
. Linux, since 2.4.10, has the O_DIRECT flag for open()
.
Is there a way to achieve the same result portably within POSIX? Like how the Win32 API here works from Windows XP to Windows 11, it would be nice to do direct IO across all UNIX-like systems in one reliably portable way.
No, there is no POSIX standard for direct IO.
There are at least two different APIs and behaviors that exist as of January 2023. Linux, FreeBSD, and apparently IBM's AIX use an O_DIRECT
flag to open()
, while Oracle's Solaris uses a directio()
function on an already-opened file descriptor.
The Linux use of the O_DIRECT
flag to the POSIX open()
function is documented on the Linux open()
man page:
O_DIRECT
(since Linux 2.4.10)Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The
O_DIRECT
flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of theO_SYNC
flag that data and necessary metadata are transferred. To guarantee synchronous I/O,O_SYNC
must be used in addition toO_DIRECT
. See NOTES below for further discussion.
Linux does not clearly specify how direct IO interacts with other descriptors open on the same file, or what happens when the file is mapped using mmap()
; nor any alignment or size restrictions on direct IO read or write operations. In my experience, these are all file-system specific and have been improving/becoming less restrictive over time, but most Linux filesystems require page-aligned IO buffers, and many (most? all?) (did? still do?) require page-sized reads or writes.
FreeBSD follows the Linux model: passing an O_DIRECT
flag to open()
:
O_DIRECT
may be used to minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care.
OpenBSD does not support direct IO. There's no mention of direct IO in either the OpenBSD open()
or the OpenBSD 'fcntl()` man pages.
IBM's AIX appears to support a Linux-type O_DIRECT
flag to open()
, but actual published IBM AIX man pages don't seem to be generally available.
SGI's Irix also supported the Linux-style O_DIRECT
flag to open()
:
O_DIRECT
If set, all reads and writes on the resulting file descriptor will be performed directly to or from the user program buffer, provided appropriate size and alignment restrictions are met. Refer to the
F_SETFL
andF_DIOINFO
commands in thefcntl(2)
manual entry for information about how to determine the alignment constraints.O_DIRECT
is a Silicon Graphics extension and is only supported on local EFS and XFS file systems, and remote BDS file systems.
Of interest, the XFS file system on Linux originated with SGI's Irix.
Solaris uses a completely different interface. Solaris uses a specific directio()
function to set direct IO on a per-file basis:
Description
The
directio()
function provides advice to the system about the expected behavior of the application when accessing the data in the file associated with the open file descriptorfildes
. The system uses this information to help optimize accesses to the file's data. Thedirectio()
function has no effect on the semantics of the other operations on the data, though it may affect the performance of other operations.The advice argument is kept per file; the last caller of
directio()
sets the advice for all applications using the file associated withfildes
.Values for advice are defined in
<sys/fcntl.h>
.
DIRECTIO_OFF
Applications get the default system behavior when accessing file data.
When an application reads data from a file, the data is first cached in system memory and then copied into the application's buffer (see
read(2)
). If the system detects that the application is reading sequentially from a file, the system will asynchronously "read ahead" from the file into system memory so the data is immediately available for the nextread(2)
operation.When an application writes data into a file, the data is first cached in system memory and is written to the device at a later time (see
write(2)
). When possible, the system increases the performance ofwrite(2)
operations by cacheing the data in memory pages. The data is copied into system memory and thewrite(2)
operation returns immediately to the application. The data is later written asynchronously to the device. When possible, the cached data is "clustered" into large chunks and written to the device in a single write operation.The system behavior for
DIRECTIO_OFF
can change without notice.
DIRECTIO_ON
The system behaves as though the application is not going to reuse the file data in the near future. In other words, the file data is not cached in the system's memory pages.
When possible, data is read or written directly between the application's memory and the device when the data is accessed with
read(2)
andwrite(2)
operations. When such transfers are not possible, the system switches back to the default behavior, but just for that operation. In general, the transfer is possible when the application's buffer is aligned on a two-byte (short) boundary, the offset into the file is on a device sector boundary, and the size of the operation is a multiple of device sectors.This advisory is ignored while the file associated with
fildes
is mapped (seemmap(2)
).The system behavior for
DIRECTIO_ON
can change without notice.
Notice also the behavior on Solaris is different: if direct IO is enabled on a file by any process, all processes accessing that file will do so via direct IO (Solaris 10+ has no alignment or size restrictions on direct IO, so switching between direct IO and "normal" IO won't break anything*). And if a file is mapped via mmap()
, direct IO on that file is disabled entirely.
* That's not quite true - if you're using a SAMFS or QFS filesystem in shared mode and access data from the filesystem's active metadata controller (where the filesystem must be mounted by design with the Solaris forcedirectio
mount option so all access is done via direct IO on that one system in the cluster), if you disable direct IO for a file using directio( fd, DIRECTIO_OFF )
, you will corrupt the filesystem. Oracle's own top-end RAC database would do that if you did a database restore on the QFS metadata controller, and you'd wind up with a corrupt filesystem.