I need to use a compressor like xz to compress huge tar archives.
I am fully aware of previous questions like Create a tar.xz in one command and Utilizing multi core for tar+gzip/bzip compression/decompression
From them, I have found that this command line mostly works:
tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz
I use the pipe solution because I absolutely must be able to pass options to xz. In particular, xz is very CPU intensive, so I must use -T0 to use all available cores. This is why I am not using other possibilities, like tar's --use-compress-program, or -J options.
Unfortunately, I really want to capture all of tar and xz's log output (i.e. non-archive output) into a log file. In the example above, log outout is always generated by those -v
options.
With the command line above, that log output is now printed on my terminal.
So, the problem is that when you use pipes to connect tar and xz as above, you cannot end the command line with something like
>Log_File 2>&1
because of that earlier
> OUTPUT_FILE.tar.xz
Is there a solution?
I tried wrapping in a subshell like this
(tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz) >Log_File 2>&1
but that did not work.
The normal stdout of tar
is the tarball, and the normal stdout of xz
is the compressed file. None of these things are logs that you should want to capture. All logging other than the output files themselves are written exclusively to stderr for both processes.
Consequently, you need only redirect stderr, and must not redirect stdout unless you want your output file mixed up with your logging.
{ tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz; } 2>Log_File
By the way -- if you're curious about why xz -v
prints more content when its output goes to the TTY, the answer is in this line of message.c
: The progress_automatic
flag (telling xz to set a timer to trigger a SIGALRM
-- which it treats as an indication that status should be printed -- every second) is only set when isatty(STDERR_FILENO)
is true. Thus, after stderr has been redirected to a file, xz
no longer prints this output at all; the problem is not that it isn't correctly redirected, but that it no longer exists.
You can, however, send SIGALRM
to xz
every second from your own code, if you're really so inclined:
{
xz -1 -T0 -v > OUTPUT_FILE.tar.xz < <(tar -cvf - paths_to_archive) & xz_pid=$!
while sleep 1; do
kill -ALRM "$xz_pid" || break
done
wait "$xz_pid"
} 2>Log_File
(Code that avoids rounding up the time needed for xz
to execute to the nearest second is possible, but left as an exercise to the reader).