I am creating archives of very large directories and splitting these archives in smaller parts as follows:
tar -vcz target_dir | pigz > target_dir.tar.gz
md5sum target_dir.tar.gz > md5sum.txt
split -n 10 target_dir.tar.gz target_dir.tar.gz.part-
The problem is with this approach that I basically need twice the space of the tar.gz file, which is problematic as some of the target directories are huge (TBs).
I could pipe the tar
output into split
to reduce the required disk space:
tar -vcz target_dir | pigz | split -n 10 - target_dir.tar.gz.part-
But how would I calculate the md5sum of the tar.gz file before it goes into split
?
Use tee
to split a stream. Use bash process substitution to run a temporary process with input from a temporary fifo.
tar -vcz target_dir |
pigz |
tee >(md5sum > md5sum.txt) |
split -n 10 - target_dir.tar.gz.part-