Search code examples
bashfilemergedirectoryfind

Merge two directories keeping larger files


Consider for example

mkdir dir1
mkdir dir2

cd dir1
echo "This file contains something" > a
touch b
echo "This file contains something" > c
echo "This file contains something" > d
touch e

cd ../dir2
touch a
echo "This file contains something" > b
echo "This file contains something" > c
echo "This file contains more data than the other file that has the same name but is in the other directory.  BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla. bla!" > d

I would like to merge dir1 and dir2. If two files have the same name, then only the one which size is the largest must be kept. Here is the expected content of the merged directory

a # Comes from `dir1`
b # Comes from `dir2`
c # Comes from either `dir1` or `dir2`
d # Comes from `dir2`
e # Comes from `dir1`(is empty)

Solution

  • Assuming that no file name a newline:

    find . -type f -printf '%s %p\n' \
      | sort -nr \
      | while read -r size file; do
        if ! [ -e "dest/${file#./*/}" ]; then
          cp "$file" "dest/${file#./*/}";
        fi;
       done
    

    The output of find is a list of "filesize path":

    221 ./dir1/a
    1002 ./dir1/b
    11 ./dir2/a
    

    Then we sort the list numeric:

    1002 ./dir1/b
    221 ./dir1/a
    11 ./dir2/a
    

    And fianlly we reach the while read -r size filename loop, where each file is copied over to the destination dest/${file#./*/} if they don't already exists.

    ${file#./*/} expands to the value of the parameter file with the leading directory removed:

    ./abc/def/foo/bar.txt -> def/foo/bar.txt, which means you might need to create the directory def/foo in the dest directory:

      | while read -r size file; do
        dest=dest/${file#./*/}
        destdir=${dest%/*}
        [ -e "$dest" ] && continue
        [ -e "$destdir" ] || mkdir -p -- "$destdir"
        cp -- "$file" "$dest"
       done