We've created a folder in my dad's computer for everyone in the family to deposit and share their photos and videos.
Example of directories:
/Family_Photo/Penguins/2017 09 02/
/Family_Photo/East Beach/2017 10 11/Seaside/
/Family_Photo/East Beach/2017 10 11/Games/
Using md5deep, I am able to create a complete list of checksum for all the files in all subdirectories
md5deep -r /Family_Photo/ > /Family_Photo/md5sum.log
Instead of every time regenerating the complete md5 checksum for all (newly added and existing) files,
How can I create a bash script to automatically detect any files that has not been md5 before and generate the checksum for these new files and append them the original md5sum.log
Solution
This should do the trick:
comm -1 -3 <(grep --text --perl-regex --only-matching '(?<= ).+' /Family_Photo/md5sum.log | sort) <(find /Family_Photo -type f | sort) | xargs --delimiter='\n' --no-run-if-empty md5deep | tee -a /Family_Photo/md5sum.log
Notes
-exec realpath {} \;
to find
, because md5deep
seems to write such paths into the file and we need them to be identical for comparison.Explanation
comm -1 -3
comm
compares two sorted lists and outputs which lines are unique to each and which are common to both-1
means: don't show lines unique to first list-3
means: don't show lines common to both files<(grep --text --perl-regex --only-matching '(?<= ).+' /Family_Photo/md5sum.log | sort)
As first file to comm
we pass a list of the already hashed filenames.
<(...)
is bash syntax to pass the result of a program as file argumentgrep
we extract the file names from the existing file by matching whatever follows double-space--text
makes sure md5sum.log is always considered a text file and not skipped--perl-regex
use perl regular expression syntax (we need this for look-behind matching)--only-matching
only output text that matched the pattern, not the entire line with the match'(?<= ).+'
the matching pattern: (?<= )
"look-behind" pattern, checks if match was preceded by
(two spaces); followed by .+
(any characters, one or more)| sort
we pass the output of grep
to sort
, because comm
expects sorted lists<(find /Family_Photo -type f | sort)
As second file to comm
we pass all files we find
<(...)
is bash syntax to pass the result of a program as filefind
will recurse a given directory and print out all file names-type -f
instructs find to only output the names of found files, not directories| sort
we pass the output of grep
to sort
, because comm
expects sorted lists| xargs --delimiter='\n' --no-run-if-empty md5deep
The resulting list of new files is passed to md5deep
|
connects the output of comm
to the input of xargs
xargs
will call a command (in this case md5deep
) with whatever comes as input as argument--delimiter='\n'
specifies a new line as seperator, so that other whitespaces in file names won't get mistaken for a new argument--no-run-if-empty
we don't want to run md5deep
if we don't have a single new filename to pass to it.| tee --append /Family_Photo/md5sum.log
The resulting list hashes will be written to the hash file
>> /Family_Photo/md5sum.log
instead.|
connects the output of md5deep
to the input of tee
tee
will output its input and also write it to a file--append
tells tee
to not overwrite file contents, but to append instead