Search code examples

Regex match words that are by their own or surrounded by underscores

I'm trying to match the word int that's either by its own or it's surrounded by underscores (_).

int  # match
_int_  # match
__int__  # match
some_int  # match
int_var  # match
integration  # doesn't match
mint  # doesn't match

This is what I've been trying, but it only matches the second case above

pattern = re.compile(r"(?<=[\W_])int(?=[\W_])")

How should I go about doing this? Thanks everyone


  • You need to use the double negation logic in this case:


    See the regex demo.

    The (?<![^\W_]) lookbehind matches a location that is not immediately preceded with any char other than a non-word and _ char. It means, there must be a start of string position or any non-word char other than _ immediately on the left.

    The (?![^\W_]) lookahead matches a location that is not immediately followed with any char other than a non-word and _ char. It means, there must be an end of string position or any non-word char other than _ immediately on the right.

    In your regex, the (?<=[\W_]) positive lookebehind you used requires a non-word or _ immediately on the left and (?=[\W_]) positive lookahead requires a non-word or an underscore char immediately on the right. Hence, these lookarounds are not allowing matches at the start or end of string.

    NOTE: As you are using Python re, you cannot simply add a ^| alternative to your lookbehind, because Python re does not allow lookbehinds with non-fixed-width patterns. (?<=[\W_]|^)int(?=[\W_]|$) will work in PHP/PCRE, Java, Ruby/Onigmo, but won't work in Python re. That is why double negation way is the only way here.