Search code examples
bashcentoscommandrsync

RSYNC with many files using `files-from`


I have multiple files named like exported-0.txt and exported-1.txt and exported-2.txt (sequential suffixes)... and inside each file there is an absolute file path per line. For example, the file exported-1.txt has a content like this:

/directory1/img.png
/aabb/file.csv
/magic/file/boo/aaa/cc.jpg
...

I understand that rsync allows the usage of --files-from from where rsync will read all the files inside that file and use that to syncronize to another server. The problem is that I have thousands of exported-N.txt files and each file has thousands of lines.

So I am wondering the best approach to execute in parallel, for example 5 rsync, each call on a different file that contains thousands of lines of files that need to be syncronized.

I have almost no knowledge on command line in Linux so I have no idea where to start. I am wondering if I can use xargs to generate numbers from 1-9999 in order for it to call rsync on each file + number. But I cant find a way to do that... Any suggestion?


Solution

  • Here's a bash solution for processing your exported-*.txt files with n concurrent rsyncs.

    The main idea is to concatenate a "computed" number of exported-*.txt files inside a process substitution <(...) and using the latter as file argument of --files-from:

    #!/bin/bash
    shopt -s nullglob
    
    n=5
    
    arr=( exported-*.txt )
    
    (( len = ${#arr[@]} / n + 1 ))
    (( rem = ${#arr[@]} % n ))
    
    idx=0
    while (( idx < ${#arr[@]} ))
    do
        (( rem-- == 0 )) && (( len-- ))
        rsync -a --files-from=<(cat -- "${arr[@]:idx:len}") /your/remote/dir/ &
        (( idx += len ))
    done
    
    wait