I'm studying shell script and there is an exercise asking to calculate the md5 hash of all files of a folder. It also asks to, in case there's two files with the same hash, print their names in the terminal. My code can do that, but once it finds a match, it's printed twice. I can't figure out how to exclude the first file name from the next iterations. Another thing: It's forbidden to create any temporary files to help with the task.
#!/bin/bash
ifs=$IFS
IFS=$'\n'
echo "Verifying the files inside the directory..."
for file1 in $(find . -maxdepth 1 -type f | cut -d "/" -f2); do
md51=$(md5sum $file1 | cut -d " " -f1)
for file2 in $(find . -maxdepth 1 -type f | cut -d "/" -f2 | grep -v "$file1"); do
md52=$(md5sum $file2 | cut -d " " -f1)
if [ "$md51" == "$md52" ]; then
echo "Files $file1 e $file2 are the same."
fi
done
done
I also would like to know if there is a more efficient way to do this task.
This
mapfile -t list < <(find . -maxdepth 1 -type f -exec md5sum {} + | sort)
mapfile -t dups < <(printf "%s\n" "${list[@]}" | grep -f <(printf "^%s\n" "${list[@]}" | sed 's/ .*//' | sort | uniq -d))
# here the array dups containing the all duplicates along with their md5sum
# you can print the array using a simple
printf "%s\n" "${dups[@]}"
and will get output like:
3b0332e02daabf31651a5a0d81ba830a ./f2.txt
3b0332e02daabf31651a5a0d81ba830a ./fff
c9eb23b681c34412f6e6f3168e3990a4 ./both.txt
c9eb23b681c34412f6e6f3168e3990a4 ./f_out
d41d8cd98f00b204e9800998ecf8427e ./aa
d41d8cd98f00b204e9800998ecf8427e ./abc def.xxx
d41d8cd98f00b204e9800998ecf8427e ./dudu
d41d8cd98f00b204e9800998ecf8427e ./start
d41d8cd98f00b204e9800998ecf8427e ./xx_yy
The following addition is just for a fancier printout
echo "duplicates:"
while read md5; do
echo "$md5"
printf "%s\n" "${dups[@]}" | grep "$md5" | sed 's/[^ ]* / /'
done < <(printf "%s\n" "${dups[@]}" | sed 's/ .*//' | sort -u)
will print something like:
3b0332e02daabf31651a5a0d81ba830a
./f2.txt
./fff
c9eb23b681c34412f6e6f3168e3990a4
./both.txt
./f_out
d41d8cd98f00b204e9800998ecf8427e
./aa
./abc def.xxx
./dudu
./start
./xx_yy
Warning: will work only if the filenames doesn't contains the \n
(newline) character. Modifying the script be general needs bash 4.4+
, where the mapfile
knows the -d
parameter.