Search code examples
bashshellduplicatesfilenames

How to find duplicate filenames (recursively) in a given directory?


I need to find every duplicate filenames in a given directory tree. I don't know, what directory tree user will give as a script argument, so I don't know the directory hierarchy. I tried this:

#!/bin/sh
find -type f | while IFS= read vo
do
echo `basename "$vo"`
done

but that's not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesn't print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command:

find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 " 

but it doesn't work for me, don't know why. Even if I have a duplicates, it prints nothing.


Solution

  • Here is another solution (based on the suggestion by @jim-mcnamara) without awk:

    Solution 1

    #!/bin/sh 
    dirname=/path/to/directory
    find $dirname -type f | sed 's_.*/__' | sort|  uniq -d| 
    while read fileName
    do
    find $dirname -type f | grep "$fileName"
    done
    

    However, you have to do the same search twice. This can become very slow if you have to search a lot of data. Saving the "find" results in a temporary file might give a better performance.

    Solution 2 (with temporary file)

    #!/bin/sh 
    dirname=/path/to/directory
    tempfile=myTempfileName
    find $dirname -type f  > $tempfile
    cat $tempfile | sed 's_.*/__' | sort |  uniq -d| 
    while read fileName
    do
     grep "/$fileName" $tempfile
    done
    #rm -f $tempfile
    

    Since you might not want to write a temp file on the harddrive in some cases, you can choose the method which fits your needs. Both examples print out the full path of the file.

    Bonus question here: Is it possible to save the whole output of the find command as a list to a variable?