Search code examples
bashunixrsyncscp

Using (scp | rsync) to pull specific files while creating folder structure at same time?


I have a big project hosted on a server that has specific files that I want to copy to my local machine in the same folder structure but only the specific file I want. My current command to find these files is (while in the project on the server):

find ./ -type f -name '*_out.csv' ! -path './*/doc/*' 2>/dev/null

Which produces a list like this (truncated for brevity):

./validation/Riso_AN8/analysis/Riso_AN8_out.csv
./validation/FUMEXII_Regate/analysis/Regate_smeared_out.csv
./validation/FUMEXII_Regate/analysis/Regate_discrete_out.csv
./validation/IFA_432/analysis/rod3/IFA_432_rod3_out.csv
./validation/IFA_432/analysis/rod1/IFA_432_rod1_out.csv
./validation/IFA_432/analysis/rod2/IFA_432_rod2_out.csv
./validation/LOCA_REBEKA_cladding_burst_tests/analysis/rebeka_2d_06MPa/rebeka_singlerod_2d_06MPa_out.csv
./validation/LOCA_REBEKA_cladding_burst_tests/analysis/rebeka_2d_06MPa/rebeka_singlerod_2d_06MPa_tm_out.csv
./validation/LOCA_REBEKA_cladding_burst_tests/analysis/rebeka_2d_08MPa/rebeka_singlerod_2d_08MPa_tm_out.csv

I would like to use scp or rsync to pull these files to my local machine and create the folder structure without anything else in them. What would be the best way to go about this? I have a ton of files so I don't really want to create the folder structure before hand. I also can't pull the entire project from the server because it's huge and the system admins will get mad at me.

Is there a way to pull these files while simultaneously creating the folder structure on my local machine?


Solution

  • I would encourage rsync and I would probably call find via ssh using the basedir to pull the files from in a process substitution. You can then feed a while loop to read each filename found on the server, obtain the path and create the path on the local machine below the current directory using mkdir -p (with validation). Then you can call rsync to pull the file from the server to the correct directory using rsync -uav. For example you could do something similar to:

    #!/bin/bash
    
    server=${1:-yourserver}       ## your server name
    basedir=${2:-/path/to/files}  ## the base directory to run find on server
    
    while read -r line; do        ## read line at a time from find output on server
        dname="${line%/*}"        ## separate directory name
        mkdir -p "$dname" || {    ## create/validate directory from remote file
            printf "error: unable to create '%s'.\n", "$dname" >&2
            continue
        }
        rsync -uav "$server:$line" "$dname" ## rsync file to correct directory
    done < <(ssh "$server" "find $basedir -type f -name '*_out.csv' ! -path './*/doc/*' 2>/dev/null")
    

    Then just call the script on the local machine providing the server name as the first argument and the base directory where the files are located on the server. Make sure you change directory on the local machine to the directory you want to create the remote directory structure under. This presumes your find call (executed on the server by ssh from the local machine) returns the list of files you wish to copy to your local machine.

    This is not nearly as efficient as a single rsync call, but if your find command produces branches under the remote directory tree that have multi-levels of directories before the filename that would not otherwise be created on the local machine, you will have to manually ensure you create those paths before calling rsync on the remote file.