Search code examples
linuxrecursionfindgreppiping

Piping find results into grep for fast directory exclusion


I am successfully using find to create a list of all files in the current subdirectory, excluding those in the subdirectory "cache." Here's my first bit of code:

find . -wholename './cach*' -prune -o -print

I now wish to pipe this into a grep command. It seems like that should be simple:

find . -wholename './cach*' -prune -o -print | xargs grep -r -R -i "samson"

... but this is returning results that are mostly from the cache directory. I've tried removing the xargs reference, but that does what you'd expect, running the grep on text of the file names, rather than on the files themselves. My goal is to find "samson" in any files that aren't cached content.

I'll probably get around this issue by just using doubled greps in this instance, but I'm very curious about why this one-liner behaves this way. I'd love to hear thoughts on a way to modify it while still using these two commands (as there are speed advantages to doing it this way).

(This is in CentOS 5, btw.)


Solution

  • The wholename match may be the reason why it's still including "cache" files. If you're executing the find command in the directory that contains the "cache" folder, it should work. If not, try changing it to -name '*cache*' instead.

    Also, you do not need the -r or -R for your grep, that tells it to recurse through directories - but you're testing individual files.

    You can update your command using the piped version, or a single-command:

    find . -name '*cache*' -prune -o -print0 | xargs -0 grep -il "samson"
    

    or

    find . -name '*cache*' -prune -o -exec grep -iq "samson" {} \; -print
    

    Note, the -l in the first command tells grep to "list the file" and not the line(s) that match. The -q in the second does the same; it tells grep to respond quietly so find will then just print the filename.