Search code examples
linuxshellbackuptargnu-parallel

How to tar files with a size limit?


I am working on taking backup my server data.

Some folders have data around 600GB, I need to tar it as 6 files for 100GB each.

I have google it got some idea to do it.(similar topic#1, similar topic#2 and so). we can achive it by

tar cvzf - data/ | split --bytes=100GB - sda1.backup.tar.gz.

Also we can untar it with

cat sda1.backup.tar.gz.* | tar xzvf -

My question is, Is there any way to do this job parallel (each tar as a separate process)? because it take long time to complete!

Or is there any other way to do this?

EDIT
Experiment:

# date;tar czf - ../saravana | split --bytes=1073741824 - data_bkp.;date
Wed May 18 09:28:32 MDT 2016
tar: Removing leading `../' from member names
tar: ../saravana: file changed as we read it
Wed May 18 09:51:08 MDT 2016

Result

-rw-r--r--  1 root root 1073741824 May 18 09:31 data_bkp.aa
-rw-r--r--  1 root root 1073741824 May 18 09:34 data_bkp.ab
-rw-r--r--  1 root root 1073741824 May 18 09:38 data_bkp.ac
-rw-r--r--  1 root root 1073741824 May 18 09:41 data_bkp.ad
-rw-r--r--  1 root root 1073741824 May 18 09:49 data_bkp.ae
-rw-r--r--  1 root root  904246985 May 18 09:51 data_bkp.af


# du -h data*
1.1G    data_bkp.aa
1.1G    data_bkp.ab
1.1G    data_bkp.ac
1.1G    data_bkp.ad
1.1G    data_bkp.ae
863M    data_bkp.af

This take 22 minutes and 36 seconds to complete!!


Solution

  • I was wondered during tar process only one cpu process is full out of four. Tar process only takes much cpu.

    So I tried with parallel processing pigz

    I found two parallel process tools PIGZ and PBZIP2 , for me PIGZ works great,

    For 22 GB test files ( 10MB files mostly, high in count not in size ) notmal tar took 23~24 Minutes, pbzip2 also tooks same time(I don't take much research on this) and pigz took 8 minutes!!! So I choose pigz.

    Once I have done with pigz, all of my cpu goes to 95% to 100%, this makes other process slow, After some google I found a solution to limit this cpu usage, CPULIMIT

    Finally I have used like this!!

    $CPULIMIT_PATH -i -l $CPU_LIMIT_VALUE $TAR_PATH -I $PIGZ_PATH \
    --ignore-failed-read -c sda1.backup.tar.gz
    

    -i - all child process, important - otherwise cpu process will same
    -l limit of the cpu in percentage for this I used

    CPU_LIMIT_VALUE=$(echo "$(nproc)*45" | bc);
    

    This will give 45% percent of all core, ie for 2 cores 90 and 4 cores 180 like that.