Search code examples

Python re negative lookbehind assertion if following pattern allows repetitions

I can't make negative lookbehind assertion work with the python re module if the following pattern allows repetitions:

import re

ok = re.compile( r'(?<!abc)def' )
print( 'abcdef' ) ) 
# -> None (ok)
print( 'abc def' ) )
# -> 'def' (ok)

nok = re.compile( r'(?<!abc)\s*def' )
print( 'abcdef' ) ) 
# -> None (ok)
print( 'abc def' ) )
# -> 'def'. Why???

My real case application is that I want to find a match in a file only if the match is not preceded by 'function ':

# Must match
mustMatch = 'x = myFunction( y )'

# Must not match
mustNotMatch = 'function x = myFunction( y )'

# Tried without success (always matches)
tried = re.compile( r'(?<!\bfunction\b)\s*\w+\s*=\s*myFunction' )
print( mustMatch  ) ) 
# -> match
print( mustNotMatch  ) )
# -> match as well. Why???

Is that a limitation?


  • " -> 'def'. Why???"

    Well, it's quite logical. Look at your pattern: (?<!abc)\s*def

    • (?<!abc) - Negative lookbehind for places that are not preceded by abc, still generates all but one position in your string
    • \s* - Zero or more spaces
    • def - litally matching def

    Thus, returning def as a match. To make more sense of this, here a small representation of the positions that are still valid after the negative lookbehind:

    enter image description here

    As you can see, still 7 valid positions. And including \s* does not affect anything since * means zero or more.

    So first apply what is explained here and then apply a pattern something like: (?<!\bfunction\b\s)\w+\s*=\s*myFunction to retrieve your matches. There may be neater ways though.