I have a program that takes a file, compresses it using /usr/bin/zip
or /bin/gzip
or /bin/bzip2
, and removes the original if and only if the compress operation completes successfully.
However, this program can be killed (via kill -9
), or, in principle, can even crash on its own!
Question: Can I assume that the zipped output file that gets created on disk is always valid, without ever having to decompress it and comparing it with the original?
In other words, no matter the point the compress operation gets ungracefully interrupted at, does the fact that the compressed output file exists on disk imply it's valid?
In other words, are the compress operation and the file creation on disk together an atomic transaction?
The main concern here is not removing the original file if the compressed file is invalid, but without having to undergo the costly decompress and compare operations.
Note:
Ignore OS file buffers not flushing to disk due to UPS failure.
Ignore disk/media related failure. This can happen much later anyway, and quite independently of the program's interruption.
A. Yes, if zip, gzip, or bzip2 complete successfully, you can assume that the resulting compressed file is valid with a high probability. Those programs have been around a loooong time, and I would assert that very nearly all data integrity bugs were worked out of them long ago. You also need to consider the reliability of your hardware in its operating environment.
B. (Your "in other words" seem like entirely different questions.) No. An ungracefully interrupted compress operation will generally leave a partial and invalid compressed file behind.
C. No. The file is created and then written to a chunk at a time. Those operations are certainly not atomic.
You just need to verify that the compression utility completed successfully by virtue of it exiting normally and returning zero as the exit code. Then you do not need to examine the compressed file unless you are super paranoid, perhaps because the data has very high value to you.
I should note that verifying the compressed data will take a fraction of the time it takes to compress it, at least for zip and gzip. bzip2 will take about the same amount of time as it took to compress.