Search code examples
regexmd5md5sum

How to exclude some files and directories from the output of md5deep?


I am using the md5deep utility to compute the hashes for files while recursively digging through a directory structure.

It allows to run command like this -

md5deep -r -l -j0 app

and gives output like this (recursive list of md5 hash of all the underlying files/directories, considering their content) -

d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/controllers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/models/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/components/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/helpers/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/behaviors/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/groups/empty
d41d8cd98f00b204e9800998ecf8427e  app/tests/fixtures/empty

I am further doing an md5sum on the result to produce a hash of the entire codebase -

md5deep -r -l -j0 app | md5sum

Output -

86df91fc29f2891ff0aa7aaa4bd13730  -

Now, I am stuck at excluding some paths (files and directories) from being considered while calculating the final md5sum. E.g. if I want to exclude these two paths - app/tests/groups/empty and app/tests/fixtures/empty.

The md5deep documentation provides an option (-f option) to provide a list of file names/directories in a file, but those files will be included. However, I am looking for the opposite, i.e. to exclude some predefined set of files/directories from the dynamic set of directories (new directories/files could be added in future) inside a given directory.

Solutions using regular expressions or some tool/utility other than md5deep are also welcome, as long as it serves my purpose. I feel a regex solution with grep would be complicated, in the absence of lookaheads. E.g. the following regex is needed just to match any string excluding ABC -

^([^A]|A([^B]|B([^C]|$)|$)|$).*$

https://stackoverflow.com/a/1395247/351903


Solution

  • Why not using find together with md5sum:

    find app -type f -exec md5sum {} \;
    d41d8cd98f00b204e9800998ecf8427e  app/tests/groups/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/components/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/behaviors/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/models/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/helpers/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/cases/controllers/empty
    d41d8cd98f00b204e9800998ecf8427e  app/tests/fixtures/empty
    

    If you need to exclude some directory, use the option -path and if you need to exclude filename use -name.

    For example if you want to exclude file which would contain models in their pathname, use the following:

    find app -type f ! -path "*models*" -exec md5sum {} \;
    

    BTW, if your looking at empty files, you can use the -empty option: find app -empty