Consider for example
mkdir dir1
mkdir dir2
cd dir1
echo "This file contains something" > a
touch b
echo "This file contains something" > c
echo "This file contains something" > d
touch e
cd ../dir2
touch a
echo "This file contains something" > b
echo "This file contains something" > c
echo "This file contains more data than the other file that has the same name but is in the other directory. BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla. bla!" > d
I would like to merge dir1
and dir2
. If two files have the same name, then only the one which size is the largest must be kept. Here is the expected content of the merged directory
a # Comes from `dir1`
b # Comes from `dir2`
c # Comes from either `dir1` or `dir2`
d # Comes from `dir2`
e # Comes from `dir1`(is empty)
Assuming that no file name a newline:
find . -type f -printf '%s %p\n' \
| sort -nr \
| while read -r size file; do
if ! [ -e "dest/${file#./*/}" ]; then
cp "$file" "dest/${file#./*/}";
fi;
done
The output of find
is a list of "filesize path":
221 ./dir1/a
1002 ./dir1/b
11 ./dir2/a
Then we sort the list numeric:
1002 ./dir1/b
221 ./dir1/a
11 ./dir2/a
And fianlly we reach the while read -r size filename
loop, where each file is copied over to the destination dest/${file#./*/}
if they don't already exists.
${file#./*/}
expands to the value of the parameter file
with the leading directory removed:
./abc/def/foo/bar.txt
-> def/foo/bar.txt
, which means you might need to create the directory def/foo
in the dest
directory:
| while read -r size file; do
dest=dest/${file#./*/}
destdir=${dest%/*}
[ -e "$dest" ] && continue
[ -e "$destdir" ] || mkdir -p -- "$destdir"
cp -- "$file" "$dest"
done