Given the string:
I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' .
The goal is to catch:
'v
'v
'w
But avoid 've
and 'll
and 't
.
I've tried to catch the 've
and 'll
and 't
with (?i)\'(?:ve|ll|t)\b
, e.g.
>>> import re
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> pattern = r"(?i)\'(?:ve|ll|t)\b"
>>> re.findall(pattern, x)
["'ll", "'ve", "'t"]
But I've also tried to negate the non-capturing group in (?i)\'(?:ve|ll|t)\b
like this (?i)\'[^(?:ve|ll|t)]\b
but it didn't catch the 'v
and 'w
that is the desired goal.
How do I catch the substrings that follows the single quote but isn't from a list of pre-defined substring, i.e. 'll
, 've
and 't
?
I've tried this too that didn't work:
pattern = "(?i)\'(?:[^ve|ll|t|\s])\b"
but the [^...]
only recognize single character and not substrings.
Maybe this one will work?
\'(?!ve|ll|t|\s)\w+
You can use lookahead assertion to filter what you don't want.
In some other languages, the pattern lookahead assert must be fixed length.
That means (?!ve|t)
is invalid as ve
and t
have two different length.