Search code examples
linuxbashrsyncxargsgnu-parallel

Rsync on multiple hosts in parallel


I need to send frequently a lot of files to a multiple hosts and is crucial to be fast and I want it to do it in parallel.

how can I run in a bash script a parallel rsync to multiple hosts?

now the script looks like this

   for i in ${listofhosts[*]}
   do
   rsync -rv --checksum  folder/ -e "ssh -i rsa_key -o 
   StrictHostKeyChecking=no" user@$i:/var/test/folder --delete  || 
   exit 1
   done

LE: I'm thinking of something with GNU Parallel or xargs but I don't know how to use them in this situation


Solution

  • With just a shell script,

    #!/bin/bash
    procs=()
    for i in "${listofhosts[@]}"; do  # notice syntax fixes
      rsync -rv --checksum  folder/ -e "ssh -i rsa_key -o 
       StrictHostKeyChecking=no" user@$i:/var/test/folder --delete &
      procs+=($!)
    done
    for proc in "${procs[@]}"; do
      wait "$proc"
    done
    

    The obvious drawback is that you can't cancel the others as soon as one of them fails. If you really have "a lot" of hosts, this will probably saturate your network bandwidth to the point where you regret asking about how to do this.

    With xargs, you can limit how many instances you run:

    # probably better if you have the hosts in a file instead of an array actually,
    # and simply run xargs <filename -P 17 -n 1 ...
    printf '%s\n' "${listofhosts[@]}" |
    xargs -P 17 -n 1 sh -c 'rsync -rv --checksum  folder/ -e "ssh -i rsa_key -o 
       StrictHostKeyChecking=no" user@"$0":/var/test/folder --delete || exit 1'
    

    Perhaps notice how we sneakily smuggle in the host in $0. You could equivalently but slightly less obscurely populate $0 with a dummy string and use $1, but it doesn't really make a lot of difference here.

    The -P 17 says to run a maximum of 17 processes in parallel (obviously, tweak to your liking), and -n 1 says to only run one instance of the command line at a time. xargs still does not offer a way to interrupt the entire batch if one of the processes fails, and only reports back summaric result codes (like, the exit code from xargs will be non-zero if at least one of the processes failed).

    If you want to keep track of which ones failed, perhaps have the script print that out to a separate file.

    rm -r failures.txt
    printf '%s\n' "${listofhosts[@]}" |
    xargs -P 17 -n 1 sh -c 'rsync -rv --checksum  folder/ -e "ssh -i rsa_key -o 
       StrictHostKeyChecking=no" user@"$0":/var/test/folder --delete && exit 0
        echo "$0" failed: $? >>failures.txt'
    

    If this is for subsequent processing, probably write the results in machine readable form to the file - maybe simply echo $? $0 so you can loop over it with

    while IFS='' read -r exitcode hostname; do
       : 
    done <failures.txt
    

    or perhaps a standard format like CSV or JSON or etc.