Search code examples
pythonioterminate

How are file objects cleaned up in Python when the process is killed?


What happens to a file object in Python when the process is terminated? Does it matter whether Python is terminated with SIGTERM, SIGKILL, SIGHUP (etc.) or by a KeyboardInterrupt exception?

I have some logging scripts that continually acquire data and write it to a file. I don't care about doing any extra clean up, but I just want to make sure that log file is not corrupted when Python is abruptly terminated (e.g. I could leave it running in the background and just shutdown the computer). I made the following test scripts to try to see what happens:

termtest.sh:

for i in $(seq 1 10); do
    python termtest.py $i & export pypid=$!
    sleep 0.3
    echo $pypid
    kill -SIGTERM $pypid
done

termtest.py:

import csv
import os
import signal
import sys

end_loop = False


def handle_interrupt(*args):
    global end_loop
    end_loop = True


signal.signal(signal.SIGINT, handle_interrupt)

with open('test' + str(sys.argv[-1]) + '.txt', 'w') as csvfile:
    writer = csv.writer(csvfile)
    for idx in range(int(1e7)):
        writer.writerow((idx, 'a' * 60000))
        csvfile.flush()
        os.fsync(csvfile.fileno())
        if end_loop:
            break

I ran termtest.sh with different signals (changed SIGTERM to SIGINT, SIGHUP, and SIGKILL in termtest.sh) (note: I put an explicit handler in termtest.py for SIGINT since Python does not handle that one other than as Ctrl+C). In all cases, all of the output files had only complete rows (no partial writes) and did not appear corrupted. I put the flush() and fsync() calls to try to make sure the data was being written to disk as much as possible so that the script had the greatest chance of being interrupted mid-write.

So can I conclude that Python always completes a write when it is terminated and does not leave a file in an intermediate state? Or does this depend on the operating system and file system (I was testing with Linux and an ext4 partition)?


Solution

  • It's not how files are "cleaned up" so much as how they are written to. It's possible that a program might perform multiple writes for a single "chunk" of data (row, or whatever) and you could interrupt in the middle of this process and end up with partial records written.

    Looking at the C source for the csv module, it assembles each row to a string buffer, then writes that using a single write() call. That should generally be safe; either the row is passed to the OS or it's not, and if it gets to the OS it's all going to get written or it's not (barring of course things like hardware issues where part of it could go into a bad sector).

    The writer object is a Python object, and a custom writer could do something weird in its write() that could break this, but assuming it's a regular file object, it should be fine.