Search code examples
bashshellrsyncdirectory-structurefile-manipulation

Copying files from a series of directories based off a list in a text file


I am attempting to use either rsync or cp in a for loop to copy files matching a list of 200 of names stored on new lines in a .txt file that match filenames with the .pdbqt extension that are in a series of subdirectories with one parent folder. The .txt file looks as follows:

file01
file02
file08
file75
file45
...

I have attempted to use rsync with the following command:

rsync -a /home/ubuntu/Project/files/pdbqt/*/*.pdbqt \
--files-from=/home/ubuntu/Project/working/output.txt \
/home/ubuntu/Project/files/top/

When I run the rsync command I receive:

rsync error: syntax or usage error (code 1) at options.c(2346) [client=3.1.2]

I have written a bash script as follows in an attempt to get that to work:

#!/bin/bash
for i in "$(cat /home/ubuntu/Project/working/output.txt | tr '\n' '')"; do
    cp /home/ubuntu/Project/files/pdbqt/*/"$i".pdbqt /home/ubuntu/Project/files/top/;
done

I understand cat isn't a great command to use but I could not figure out an alternate solution to it, as I am still new to using bash. Running that I get the following error:

tr: when not truncating set1, string2 must be non-empty
cp: cannot stat '/home/ubuntu/Project/files/pdbqt/*/.pdbqt': No such file or directory

I assume that the cp error is thrown as a result of the tr error but I am not sure how else to get rid of the \n that is read from the new line separated list.

The expected results are that from the subdirectories in /pdbqt/ with the 12000 .pdbqt files the 200 files from the output.txt list would be copied from those subdirectories into the /top/ directory.


Solution

  • for loops are good when your data is already in shell variables. When reading in data from a file, while ... read loops work better. In your case, try:

    while IFS= read -r file; do  cp -i -- /home/ubuntu/Project/files/pdbqt/*/"$file".pdbqt  /home/ubuntu/Project/files/top/; done </home/ubuntu/Project/working/output.txt
    

    or, if you find the multiline version more readable:

    while IFS= read -r file
    do
        cp -i -- /home/ubuntu/Project/files/pdbqt/*/"$file".pdbqt /home/ubuntu/Project/files/top/
    done </home/ubuntu/Project/working/output.txt
    

    How it works

    • while IFS= read -r file; do

      This starts a while loop reading one line at a time. IFS= tells bash not to truncate white space from the line and -r tells read not to mangle backslashes. The line is stored in the shell variable called file.

    • cp -i -- /home/ubuntu/Project/files/pdbqt/*/"$file".pdbqt /home/ubuntu/Project/files/top/

      This copies the file. -i tells cp to ask before overwriting an existing file.

    • done </home/ubuntu/Project/working/output.txt

      This marks the end of the while loop and tells the shell to get the input for the loop from /home/ubuntu/Project/working/output.txt