Search code examples
regexunixgrep

Extended Regular Expression: Find a Word That is Not Part of Another Word


I am trying to search for words in a file using egrep. I am limited to egrep and cannot add a -v option so I must do it through the pattern.

Example file

... blah
blah foo blah
blah foobar blah
bhah_foobaz_blah
blah ...

Desired output

blah foo blah
bhah_foobaz_blah

I want to find every line containing an instance of foo that is not part of the word foobar.

From what I could find so far I thought it would be something like this, but it returns nothing:

egrep -i 'foo+^((?!bar).)*' 

Solution

  • Perl regexes support negative lookahead, the (?!) feature you tried to use. It's the ideal way to express the idea of "foo but not foobar".

    grep -P 'foo(?!bar)'
    

    If you're limited to POSIX extended regular expressions there is no equivalent feature. It's possible but quite convoluted to look for a non-match without negative lookaheads.

    One way to do it is to check character by character at what follows foo. The next character is either

    1. End of string ($)
    2. Any character except "b" ([^b])
    3. A "b" (b)

    If it's either of the first two cases you're done, it's a match. If it's a b then you have to check the character following the b using the same three-part pattern. The pattern looks like $|[^b]|b(...) where the ... represents a nested pattern. Putting all the nested patterns together you get:

    grep -E 'foo($|[^b]|b($|[^a]|a($|[^r])))'