Search code examples
pythonregexpython-re

Replace a string using re.sub only if prefix and suffix matches


I am trying to convert German words to English using custom dictionary. In below code, replace should only happen if the suffix or prefix of the matching word falls in characters

[,\/!?()_1234567890-=+."""' "]

For example:

Mein should be converted at first but not in MeinName as the prefix and suffix are not characters mentioned above. If there were single words like _Mein or Mein. they need to be converted.

import re

string = "Mein ,Name, ist John, Wo23 bist+ ,_du? , MeinName "
replacements = {
    'Mein': 'my',
    'ist': 'is',
    'Wo': 'where',
    'bist': 'are',
    'du': 'you',
    'is': 'iis'
}
re.sub(
    '({})'.format('|'.join(map(re.escape, replacements.keys()))),
    lambda m: replacements[m.group()],
    string
)

Expected output:

my ,name,is John,where23 are+,_you? ,MeinName 

Solution

  • You can use

    import re
    s = "Mein ,Name, ist John, Wo23 bist+ ,_du? , MeinName "
    replacements = { "Mein": "my", "ist": "is", "Wo":"where", "bist":"are", "du":"you", "is" :"iis"}
    rx = r'(?:{})(?=[,/!?()_0-9\-=+."\s\'])'.format('|'.join(map(re.escape, replacements.keys())))
    print (rx)
    print ( re.sub(rx, lambda m: replacements[m.group()], s) )
    # => my ,Name, is John, where23 are+ ,_you? , MeinName 
    

    See the Python demo.

    The regex will look like

    (?:Mein|ist|Wo|bist|du|is)(?=[,/!?()_0-9\-=+."\s\'])
    

    See the regex demo. Details:

    • (?:Mein|ist|Wo|bist|du|is) - one of the alternative strings
    • (?=[,/!?()_0-9\-=+."\s\']) - a positive lookahead matching a location that is immediately followed with ,, /, !, ?, ), (, _, a digit, -, =, +, ., ", whitespace and '.