I'm trying to match the word int
that's either by its own or it's surrounded by underscores (_
).
int # match
_int_ # match
__int__ # match
some_int # match
int_var # match
integration # doesn't match
mint # doesn't match
This is what I've been trying, but it only matches the second case above
pattern = re.compile(r"(?<=[\W_])int(?=[\W_])")
How should I go about doing this? Thanks everyone
You need to use the double negation logic in this case:
(?<![^\W_])int(?![^\W_])
See the regex demo.
The (?<![^\W_])
lookbehind matches a location that is not immediately preceded with any char other than a non-word and _
char. It means, there must be a start of string position or any non-word char other than _
immediately on the left.
The (?![^\W_])
lookahead matches a location that is not immediately followed with any char other than a non-word and _
char. It means, there must be an end of string position or any non-word char other than _
immediately on the right.
In your regex, the (?<=[\W_])
positive lookebehind you used requires a non-word or _
immediately on the left and (?=[\W_])
positive lookahead requires a non-word or an underscore char immediately on the right. Hence, these lookarounds are not allowing matches at the start or end of string.
NOTE: As you are using Python re
, you cannot simply add a ^|
alternative to your lookbehind, because Python re
does not allow lookbehinds with non-fixed-width patterns. (?<=[\W_]|^)int(?=[\W_]|$)
will work in PHP/PCRE, Java, Ruby/Onigmo, but won't work in Python re
. That is why double negation way is the only way here.