Search code examples
regexnegative-lookbehind

Regex Negative Lookbehind, match a word as long as is not precede by other word


I'm trying to create a simple regex where I need to capture all the Dogs occurrences as long as it is not preceded by another word Cats. Here are some examples to test the regex:

  1. My Dogs are happy -> Should Match (preceded by "My" which is valid)
  2. Dogs are humans best friend -> Should match (first word, which is not preceded by anything)
  3. This is invalid Cats Dogs -> Should NOT match (preceded by a invalid word Cats)
  4. The Dogs and Cats and Dogs and Dogs -> Should match (Found multiple "Dogs" and none is immediately preceded by "Cats")
  5. The TomCats Dogs are valid -> Should match (TomCats is a word itself different than Cats)

I'm trying with a regex similar to this:

((?<!\bCats\b)\s*\bDogs\b)

Which doesn't give the right results (it matches all cases when it should not match the 3rd case)

Also, if I use something similar:

((?<!\bCats\b)\s+\bDogs\b)

It returns the right result for cases 1 and 3, but it does Not match case 2 since Dogs was found at the beginning and it is not preceded by white space.

Case sensitivity is not a problem here. I'm using Java to test this regex


Solution

  • If I understand your requirements clearly then you may use this regex with a negative lookahead instead of lookbehind:

    ^(?!.*\bCats\s+Dogs\b).*?\bDogs\b
    

    RegEx Demo

    RegEx Details:

    • ^: Start
    • (?!.*\bCats\s+Dogs\b): Negative lookahead to fail the match if we find word Cats followed by 1+ whitespace followed by word Dogs anywhere
    • .*?\bDogs\b: Match word Dogs after 0 or more characters