I can't make negative lookbehind assertion work with the python re module if the following pattern allows repetitions:
import re
ok = re.compile( r'(?<!abc)def' )
print( ok.search( 'abcdef' ) )
# -> None (ok)
print( ok.search( 'abc def' ) )
# -> 'def' (ok)
nok = re.compile( r'(?<!abc)\s*def' )
print( nok.search( 'abcdef' ) )
# -> None (ok)
print( nok.search( 'abc def' ) )
# -> 'def'. Why???
My real case application is that I want to find a match in a file only if the match is not preceded by 'function ':
# Must match
mustMatch = 'x = myFunction( y )'
# Must not match
mustNotMatch = 'function x = myFunction( y )'
# Tried without success (always matches)
tried = re.compile( r'(?<!\bfunction\b)\s*\w+\s*=\s*myFunction' )
print( tried.search( mustMatch ) )
# -> match
print( tried.search( mustNotMatch ) )
# -> match as well. Why???
Is that a limitation?
" -> 'def'. Why???"
Well, it's quite logical. Look at your pattern: (?<!abc)\s*def
(?<!abc)
- Negative lookbehind for places that are not preceded by abc
, still generates all but one position in your string\s*
- Zero or more spacesdef
- litally matching defThus, returning def
as a match. To make more sense of this, here a small representation of the positions that are still valid after the negative lookbehind:
As you can see, still 7 valid positions. And including \s*
does not affect anything since *
means zero or more.
So first apply what is explained here and then apply a pattern something like: (?<!\bfunction\b\s)\w+\s*=\s*myFunction
to retrieve your matches. There may be neater ways though.