Search code examples
linuxgrepblacklist

grep based on blacklist -- without procedural code?


It's a well-known task, simple to describe:

Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.

A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.

I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.

But I feel that I should be really close with xargs, and just can't see the last step.

I know that if excluded_filenames.txt has only 1 file name in it, then

grep -v foo.txt `cat excluded_filenames.txt`

will do it.

And I know that I can get the filenames one per line with

xargs -L1 -a excluded_filenames.txt

So how do I combine those two into a single solution, without explicit loops in a procedural language?

Looking for the simple and elegant solution.


Solution

  • You should use the -f option (or you can use fgrep which is the same):

    grep -vf excluded_filenames.txt foo.txt
    

    You could also use -F which is more directly the answer to what you asked:

    grep -vF "`cat excluded_filenames.txt`" foo.txt
    

    from man grep

    -f FILE, --file=FILE
              Obtain patterns from FILE, one per line.  The empty file contains zero patterns, and therefore matches nothing.
    
    -F, --fixed-strings
              Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.