Search code examples
regexpython-3.xword-boundary

In Python, how to replace either an 'a' or 'an' with a number indicating more than one?


Note: This question is certainly different from Does re in Python support word boundaries (/b). The alluded link seeks an answer for a very simple query for which a cursory glance on any tutorial in Python regular expression would have provided the explanation with examples. My question was using a word boundary around an OR expression and is far from trivial or to be reckoned as duplicate.

I was trying to build a palatable example to demonstrate regex word boundaries. Towards this, I wanted to show how the singular food items ordered by a diet-conscious person are changed for a guzzler and wrote the following program:

import re
items_lean = 'a masala dosa, an idli and a mango lassi'
pattern = r'{}'.format('an|a') # Use pattern as dynamic variable in regex
items_fat = re.sub(pattern, 'four', items_lean) # OOPS
print(items_fat)
pattern_fat = r'{}'.format('\ban\b|\ba\b') # Ensure a or an occurs as a word by itself
items_fat_proper = re.sub(pattern_fat, 'four', items_lean)
print(items_fat_proper)

I expected the following outputs corresponding to each print statement

four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
four masala dosa, four idli and four mango lassi

But, what I got was:

four mfoursfourlfour dosfour, four idli fourd four mfourgo lfourssi
a masala dosa, an idli and a mango lassi

Where should the \b factor be placed to get the desired output?


Solution

  • In order to satisfy the guzzlers you need to escape the \bs or use the raw input format i.e.

    pattern_fat = r'\ban\b|\ba\b'
    

    I've also removed the superfluous format which I suspect caused this confusion!