import re
sstring = "ON Any ON Any"
regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)
regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
for a in regex1.findall(sstring): print(a)
print("----------")
for a in regex2.findall(sstring): print(a)
print("----------")
for a in regex3.findall(sstring): print(a)
print("----------")
('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', 'Any')
('', '')
ON
Any
ON
Any
Having read many articles on the internet and S.O. I think I still don't understand the regex word boundary: \b
The first regex doesn't give me the expected result I think it's must give me "ON Any On Any" but it still not give me that.
The second regex gives me tuples and I don't know why or understand the meaning of: ('', '')
The third regex gives prints the results on separated lines and empty lines in betweens
Could you please help me to understand that.
Note that to match ON ANY
you need to add an escaped (since you are using re.VERBOSE
flag) space between ON
and ANY
as \b
word boundary being a zero-width assertion does not consume any text, just asserts a position between specific characters. That is the reason for your first re.compile(r''' \bON\bANY\b''', re.VERBOSE)
approach failure.
Use
rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)
See the Python demo
The re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
returns tuples since you defined (...)
capturing groups in the pattern.
The re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
matches optional sequences, either ON
or Any
, so you get those words as values. You get empty values as well because this regex can match just a word boundary (all other subpatterns are optional).
More details about word boundaries: