Search code examples
bashtreefindcopyrsync

How to bash copy all files in directory/sub-directories excluding those contained in a list while flattening the tree structure?


I have a directory (called 'source') that contains sub-directories and files. Using bash I need to copy all files (and only files, not directories) found in this directory and each of its sub-directories to a different directory (called 'destination'). The directory tree must not be maintained/must be flattened. Only files that are not included in a text file (called 'excluded.txt') must be copied.

Source input examples:

/home/source/AAA/file1.xyz 
/home/source/AAA/GGG/file2.xyz
/home/source/BBB/file3.tuv
/home/source/BBB/HHH/file4.tuv

Destination output examples:

/home/destination/file1.xyz
/home/destination/file2.xyz
/home/destination/file3.tuv
/home/destination/file4.tuv

Once the files have been copied, the four+ filenames (file1.xyz, etc,) are added to excluded.txt (with each filename on a new line). The files will then be removed from destination directory periodically.

If the bash script is executed again, and source files are present, they should not be copied to destination if their filenames appear in the excluded.txt file.

I have failed by attempting to us "cp" and "rsync", as the directory tree structure was maintained. I have also failed using "find", as I haven't been able to check the results against the "excluded.txt" list before taking the copy action.


Solution

  • The answer provided by @Aserre was instrumental in finding this solution. His solution will work for all files that do not contain spaces. After reading about eval (evaluating/executing strings), string concatenation, and how to read entire lines into variables, I was able to write and execute the following code successfully.

    while read -r line
    do
        name="$line"
        exclude="$exclude ! -name \"$name\""
    done < "/mnt/destination/exclude.txt"
    cmd1="find \"/home/source\" -type f "
    cmd2=" -exec cp -n {} \"/home/destination\" \;"
    result=$cmd1$exclude$cmd2
    eval $result
    

    Explanation (credit to @Aserre):

    • while read -r line : go through every line in exclude.txt. "-r" flag causes the backslash to be considered to be part of the line.
    • name="$line" : the entire line in excluded.txt is stored in a new string called "name".
    • exclude="$exclude ! -name \"$name\"" : will store ! -name "file1" ! -name "file2" ! -name... in a new string called "exclude". This string is a list of all the files to exclude, each preceded by ! -name. The backslash is necessary before each of those quotation marks.
    • cmd1= : store the following 2 commands into a string called "cmd1".
    • find /home/source : the path of the root directory to search. Search is recursive.
    • -type f : retrieve files only.
    • -exec cp -n {} /home/destination : action executed for each found item. {} represents the item that was found.
    • cmd2= : stores the previous command into a string called "cmd2".
    • result=$cmd1$exclude$cmd2 : concatenates all 3 strings.
    • eval $result : take the string named "result" and run it as a command.