Search code examples
pythonregexlookbehind

Python Negative Lookbehind with a variable number of characters


I know there are a lot of regex and negative lookbehind questions but I have one that I cannot find an answer to. I want to find instances of water but not if it has never in front of it with a variable number of characters between the two. There is an infinite number of variable characters between these two words and lookbehind does not allow for variable characters. I have code that will find never but it will find never at the very start of the script. Is there a way to limit a lookbehind to only 20 or 30 characters? What I have:

(?i)^(?=.*?(?:water))(?:(?!never).)*$

Just some of the examples I am working with:

water                                                         (match)
I have water                                                  (match)
I never have water
Where is the water.                                           (match)
I never have food or water
I never have food but I always have water                     (match)
I never have food or chips. I like to walk. I have water      (match)

Again, the problem is that I could have a paragraph that is 10 sentences long and if it has never any where in there it will not find water and that lookbehind and lookahead does not accept variable characters. I appreciate any help you could give.


Solution

  • You can use this regex in Python's builtin re module:

    (?i)^(?!.*\bnever\b.{,20}\bwater\b).*\bwater\b
    

    RegEx Demo

    RegEx Details:

    • (?i): Enable ignore case mode
    • ^: Start
    • (?!.*\bnever\b.{,20}\bwater\b): Negative lookahead condition. This will fail the match if word never appears within 20 characters before word water.
    • .*\bwater\b: Find word water anywhere in the line