Search code examples
pythonregexquotation-markscapturing-group

How to catch a pattern that's not in the non-capturing group? - Python


Given the string:

I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' .

The goal is to catch:

'v 
'v
'w

But avoid 've and 'll and 't.

I've tried to catch the 've and 'll and 't with (?i)\'(?:ve|ll|t)\b , e.g.

>>> import re
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> pattern = r"(?i)\'(?:ve|ll|t)\b"
>>> re.findall(pattern, x)
["'ll", "'ve", "'t"]

But I've also tried to negate the non-capturing group in (?i)\'(?:ve|ll|t)\b like this (?i)\'[^(?:ve|ll|t)]\b but it didn't catch the 'v and 'w that is the desired goal.

How do I catch the substrings that follows the single quote but isn't from a list of pre-defined substring, i.e. 'll, 've and 't ?


I've tried this too that didn't work:

pattern = "(?i)\'(?:[^ve|ll|t|\s])\b"

but the [^...] only recognize single character and not substrings.


Solution

  • Maybe this one will work?

    \'(?!ve|ll|t|\s)\w+
    

    You can use lookahead assertion to filter what you don't want.

    update

    In some other languages, the pattern lookahead assert must be fixed length.

    That means (?!ve|t) is invalid as ve and t have two different length.