I would like to identify the identical files in a directory tree of a Synology NAS.
Is there a way to do it robustly and efficiently?
Here's what I tried:
basedir=/volume1/bordel
find "$basedir" -type f -exec md5sum {} + |
sort -k1,1 |
uniq -d
But I get no output, which is impossible
If your uniq
supports -D
and -w
options:
find . -type f -exec md5sum {} + |
sed 's/^\\\(.*\)/\1\\/' |
sort -k1,1 |
uniq -w32 -D |
sed 's/\(.*\)\\$/\\\1/'
The sed
commands are to rectify the md5sum
lines that begin with a backslash character. In some versions of md5sum
, lines begin with a backslash if the filename contains a newline or backslash character (and those characters are escaped with backslashes in the filenames; \n
and \\
).
The -w 32
option of uniq
is to compare only 32 characters at the beginning of the lines, and the -D
option prints all duplicated lines (in the first 32 characters). These options are GNU extensions.