Search code examples
pythonpython-3.xiocpythondisk-io

When is a write to disk triggered?


In Python, I can open a file with f= open(<filename>,<permissions>). This returns an object f which I can write to using f.write(<some data>).

If, at this point, I access the original final (eg with cat from a terminal), it appears empty: Python stored the data I wrote to the object f and not the actual on-disk file. If I then call f.close(), the data in f is persisted to the on-disk file (and I can access it from other programs).

I assume data is buffered to improve latency. However, what happens if the buffered data grows a lot? Will Python initiate a write? If so, details on the internals (what influences the buffer size? is the disk I/O handled within Python or by another program/thread? is there a chance Python will just hang during the write?) would be much appreciated.


Solution

  • The general subject of I/O buffering has been treated many times (including in questions linked from the comments). But to answer your specific questions:

    • By default, when writing to a terminal (“the screen”), a newline causes the text to be flushed up through it. For all files, the buffer is flushed each time it fills. (Large single writes might flush any existing buffer contents and then bypass it.)
    • The buffer has a fixed size and is allocated before any data is written; Python 3 doesn’t use stdio, so it chooses its own buffer sizes. (A few kB is typical.)
    • The “disk I/O” (really kernel I/O, which is distinguishable only in certain special circumstances like network/power failure) happens within whatever Python write triggers the flush.
    • Yes, it can hang, if the file is a pipe to a busy process, a socket over a slow network, a special device, or even a regular file mounted from a remote machine.