I'm collecting md5sum
snapshots of the same filesystem at two different points in time. (ie, Before and after an infection.) I need to diff
these two states in order to see what files change between these two points in time.
To collect these states I might do the following (on macOS with SIP turned off):
sudo gfind / ! -path '*/dev/*' ! -path '*/Network/*' ! -path '*/Volumes/*' ! -path '*/.fseventsd/*' ! -path '*/.Spotlight-V100/*' -type f -exec md5sum {} \; > $(date "+%y%m%d%H%M%S").system_listing
The problem I'm having is that the resultant files are around 100MB a piece and using diff
by itself seems to compare chunks instead of each individual file's md5sum
in the output.
Is there an efficient way of using diff
tools to do this or is it necessary to write a script to somehow compare the two files based upon filename paths, effectively recreating diff to compare lines with path as the unique comparator value and then return info based on the associated md5sum
?
appearance of directories order could produce a lot of noisy diff
for example i ran the following two commands , diffing two directories full of pdfs.
one with 1 file , the other with tens of files
swapping the directory order produce 2 diff line,
instead we want to the diff report the fact of no diff .
find books/ docs-pdf/ -type f -exec md5sum {} \; > snapshot1
find docs-pdf/ books/ -type f -exec md5sum {} \; > snapshot2
diff snapshot1 snapshot2
--- snapshot1
+++ snapshot2
@@ -1,4 +1,3 @@
-83322cb1aaa94f9c8e87925f9d2a695e books/ModSimPy.pdf
192e5d38e59d8295ec9ca715e784a6d0 docs-pdf/c-api.pdf
76c5bfb41bc6e5f9c8da1ab1f915e622 docs-pdf/distributing.pdf
0a630ec314653c68153f5bbc4446660c docs-pdf/extending.pdf
@@ -25,3 +24,4 @@
31e3dc3f78a12c59cdc0426d8e75ec99 docs-pdf/tutorial.pdf
4c59e969009b6c3372804efdfc99e2d9 docs-pdf/using.pdf
cf5330f4ed5ca5f63f300ccfa3057825 docs-pdf/whatsnew.pdf
+83322cb1aaa94f9c8e87925f9d2a695e books/ModSimPy.pdf
after sorting by 2nd column , diff successfully report with no diff
sort -k2 snapshot1 >sorted.snapshot1
sort -k2 snapshot2 >sorted.snapshot2
diff sorted.snapshot1 sorted.snapshot2
if this did not solve all noisy diff outputs , please post out pieces of the example output you do not want