I need to compress a file on Linux using tar from a thread in a process. The file is getting updated like once in a second.
I am running the following command to do that from a separate process.
tar -cvzf /destination/compressed_files.tar.gz /directory/to/archive
It works well. No issues found. But, I have the following question.
Primary question:
I am trying to be a bit safer and more reliable with my code. Is the above way safe "in perspective of tar command". Does tar implicitly take care of the fact that the files is getting updated and it compresses whatever is possible? Does tar make a copy of the content internally?
Secondary question:
I found that following is a way to tar as well
tar -cvzf /destination/compressed_files.tar.gz -C /destination /directory/to/archive
Looks like -C
option changes the directory? Is it safer to use -C
here?
The behavior you will get depends on a bunch of factors, including the type of filesystem (e.g. NFS vs a local disk), the way the file is being written, and how much data is appended each time.
In the best case scenario, the writer has a local file open in append mode, and writes lines of text no longer than some internal buffer size. In this case, you will probably not see any problems. If the lines are very long, you may see partial lines. If the file mode is not append, or the filesystem is NFS or something else unusual, you may see zeros or garbage values at the end of the file (because the file length was changed before the content).
In general, it would be better not to rely on this. A typical approach is to "roll" the file every so often, closing the old one and starting to write a new one. Then you can archive only "complete" files which the writer has closed (and perhaps renamed to indicate they are complete).