What happens to a file object in Python when the process is terminated? Does it matter whether Python is terminated with SIGTERM
, SIGKILL
, SIGHUP
(etc.) or by a KeyboardInterrupt
exception?
I have some logging scripts that continually acquire data and write it to a file. I don't care about doing any extra clean up, but I just want to make sure that log file is not corrupted when Python is abruptly terminated (e.g. I could leave it running in the background and just shutdown the computer). I made the following test scripts to try to see what happens:
termtest.sh
:
for i in $(seq 1 10); do
python termtest.py $i & export pypid=$!
sleep 0.3
echo $pypid
kill -SIGTERM $pypid
done
termtest.py
:
import csv
import os
import signal
import sys
end_loop = False
def handle_interrupt(*args):
global end_loop
end_loop = True
signal.signal(signal.SIGINT, handle_interrupt)
with open('test' + str(sys.argv[-1]) + '.txt', 'w') as csvfile:
writer = csv.writer(csvfile)
for idx in range(int(1e7)):
writer.writerow((idx, 'a' * 60000))
csvfile.flush()
os.fsync(csvfile.fileno())
if end_loop:
break
I ran termtest.sh
with different signals (changed SIGTERM
to SIGINT
, SIGHUP
, and SIGKILL
in termtest.sh
) (note: I put an explicit handler in termtest.py
for SIGINT
since Python does not handle that one other than as Ctrl+C
). In all cases, all of the output files had only complete rows (no partial writes) and did not appear corrupted. I put the flush()
and fsync()
calls to try to make sure the data was being written to disk as much as possible so that the script had the greatest chance of being interrupted mid-write.
So can I conclude that Python always completes a write when it is terminated and does not leave a file in an intermediate state? Or does this depend on the operating system and file system (I was testing with Linux and an ext4 partition)?
It's not how files are "cleaned up" so much as how they are written to. It's possible that a program might perform multiple writes for a single "chunk" of data (row, or whatever) and you could interrupt in the middle of this process and end up with partial records written.
Looking at the C source for the csv
module, it assembles each row to a string buffer, then writes that using a single write()
call. That should generally be safe; either the row is passed to the OS or it's not, and if it gets to the OS it's all going to get written or it's not (barring of course things like hardware issues where part of it could go into a bad sector).
The writer object is a Python object, and a custom writer could do something weird in its write()
that could break this, but assuming it's a regular file object, it should be fine.