I'm trying to separate a word with two adjacent vowels by inserting a non-alphabetic group of characters. When I use re.sub()
with a non-empty substitution, the result shows the insertion but the insertion seems to have "eaten up" the following character.
Here's an example"
import = re
word = "aorta"
re.sub('(?<=[AEOUaeouy])(?:[aeoui])', '[=]', word)
#actual output => 'a[=]r[=]ta'
#expected output => 'a[=]or[=]ta'
Why is the character following the insertion eaten up?
You should use a positive lookahead (a non-consuming pattern that only checks for the presence of some chars without actually adding them to the match value), not a non-capturing group (a consuming pattern that puts the matched chars into the match value that get replaced with re.sub
).
Use
import re
word = "aorta"
print(re.sub('([AEOUaeouy])(?=[aeoui])', r'\1[=]', word))
# => a[=]orta
See the Python demo.
Note: if you wish to get 'a[=]or[=]ta'
, add r
to the lookbehind character class, [AEOUaeouy]
=> [AEOUaeouyr]
.
Details
([AEOUaeouy])
- Group 1: any one of the chars defined in the pattern(?=[aeoui])
- a position that is followed with the chars in the character class \1
- in the replacement pattern, inserts the value captured with Group 1.