python regex character outlook-addin regex-lookarounds

re.sub() with a non-empty substitution eats up following character in Python

I'm trying to separate a word with two adjacent vowels by inserting a non-alphabetic group of characters. When I use re.sub() with a non-empty substitution, the result shows the insertion but the insertion seems to have "eaten up" the following character.

Here's an example"

import = re

word = "aorta"

re.sub('(?<=[AEOUaeouy])(?:[aeoui])', '[=]', word)
#actual output => 'a[=]r[=]ta'
#expected output => 'a[=]or[=]ta'

Why is the character following the insertion eaten up?

Solution

You should use a positive lookahead (a non-consuming pattern that only checks for the presence of some chars without actually adding them to the match value), not a non-capturing group (a consuming pattern that puts the matched chars into the match value that get replaced with re.sub).

Use

import re
word = "aorta"
print(re.sub('([AEOUaeouy])(?=[aeoui])', r'\1[=]', word))
# => a[=]orta

See the Python demo.

Note: if you wish to get 'a[=]or[=]ta', add r to the lookbehind character class, [AEOUaeouy] => [AEOUaeouyr].

Details

([AEOUaeouy]) - Group 1: any one of the chars defined in the pattern
(?=[aeoui]) - a position that is followed with the chars in the character class
\1 - in the replacement pattern, inserts the value captured with Group 1.