Note: This question is certainly different from Does re in Python support word boundaries (/b). The alluded link seeks an answer for a very simple query for which a cursory glance on any tutorial in Python regular expression would have provided the explanation with examples. My question was using a word boundary around an OR expression and is far from trivial or to be reckoned as duplicate.
I was trying to build a palatable example to demonstrate regex word boundaries. Towards this, I wanted to show how the singular food items ordered by a diet-conscious person are changed for a guzzler and wrote the following program:
import re
items_lean = 'a masala dosa, an idli and a mango lassi'
pattern = r'{}'.format('an|a') # Use pattern as dynamic variable in regex
items_fat = re.sub(pattern, 'four', items_lean) # OOPS
print(items_fat)
pattern_fat = r'{}'.format('\ban\b|\ba\b') # Ensure a or an occurs as a word by itself
items_fat_proper = re.sub(pattern_fat, 'four', items_lean)
print(items_fat_proper)
I expected the following outputs corresponding to each print statement
four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
four masala dosa, four idli and four mango lassi
But, what I got was:
four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
a masala dosa, an idli and a mango lassi
Where should the \b factor be placed to get the desired output?
In order to satisfy the guzzlers you need to escape the \b
s or use the raw input format i.e.
pattern_fat = r'\ban\b|\ba\b'
I've also removed the superfluous format
which I suspect caused this confusion!