How to create md5sum for new files

We've created a folder in my dad's computer for everyone in the family to deposit and share their photos and videos.

Example of directories:
/Family_Photo/Penguins/2017 09 02/
/Family_Photo/East Beach/2017 10 11/Seaside/
/Family_Photo/East Beach/2017 10 11/Games/

Using md5deep, I am able to create a complete list of checksum for all the files in all subdirectories

md5deep -r /Family_Photo/ > /Family_Photo/md5sum.log

Instead of every time regenerating the complete md5 checksum for all (newly added and existing) files,

How can I create a bash script to automatically detect any files that has not been md5 before and generate the checksum for these new files and append them the original md5sum.log

Solution

Solution

This should do the trick:

comm -1 -3 <(grep --text --perl-regex --only-matching '(?<= ).+' /Family_Photo/md5sum.log | sort) <(find /Family_Photo -type f | sort) | xargs --delimiter='\n' --no-run-if-empty md5deep | tee -a /Family_Photo/md5sum.log

Notes

If you use a different path than the one in the example, make sure to use an absolute and canonical path or append the option -exec realpath {} \; to find, because md5deep seems to write such paths into the file and we need them to be identical for comparison.
This command line uses bash specific syntax (passing commands as files) and might not work in different shell interpreters.

Explanation

comm -1 -3
- We use this command in this specific case to see which files are new by comparing found files to the existing list.
- comm compares two sorted lists and outputs which lines are unique to each and which are common to both
- -1 means: don't show lines unique to first list
- -3 means: don't show lines common to both files
- as a result we only output lines unique to second list
<(grep --text --perl-regex --only-matching '(?<= ).+' /Family_Photo/md5sum.log | sort) As first file to comm we pass a list of the already hashed filenames.
- <(...) is bash syntax to pass the result of a program as file argument
- With grep we extract the file names from the existing file by matching whatever follows double-space
- --text makes sure md5sum.log is always considered a text file and not skipped
- --perl-regex use perl regular expression syntax (we need this for look-behind matching)
- --only-matching only output text that matched the pattern, not the entire line with the match
- '(?<= ).+' the matching pattern: (?<= ) "look-behind" pattern, checks if match was preceded by (two spaces); followed by .+ (any characters, one or more)
- | sort we pass the output of grep to sort, because comm expects sorted lists
<(find /Family_Photo -type f | sort) As second file to comm we pass all files we find
- <(...) is bash syntax to pass the result of a program as file
- find will recurse a given directory and print out all file names
- -type -f instructs find to only output the names of found files, not directories
- | sort we pass the output of grep to sort, because comm expects sorted lists
| xargs --delimiter='\n' --no-run-if-empty md5deep The resulting list of new files is passed to md5deep
- | connects the output of comm to the input of xargs
- xargs will call a command (in this case md5deep) with whatever comes as input as argument
- --delimiter='\n' specifies a new line as seperator, so that other whitespaces in file names won't get mistaken for a new argument
- --no-run-if-empty we don't want to run md5deep if we don't have a single new filename to pass to it.
| tee --append /Family_Photo/md5sum.log The resulting list hashes will be written to the hash file
- This displays the new files/hashes for your convenience while writing them, if you don't want to see them, just use >> /Family_Photo/md5sum.log instead.
- | connects the output of md5deep to the input of tee
- tee will output its input and also write it to a file
- --append tells tee to not overwrite file contents, but to append instead