I am trying to convert German words to English using custom dictionary. In below code, replace should only happen if the suffix or prefix of the matching word falls in characters
[,\/!?()_1234567890-=+."""' "]
For example:
Mein
should be converted at first but not in MeinName
as the prefix and suffix are not characters mentioned above. If there were single words like _Mein
or Mein.
they need to be converted.
import re
string = "Mein ,Name, ist John, Wo23 bist+ ,_du? , MeinName "
replacements = {
'Mein': 'my',
'ist': 'is',
'Wo': 'where',
'bist': 'are',
'du': 'you',
'is': 'iis'
}
re.sub(
'({})'.format('|'.join(map(re.escape, replacements.keys()))),
lambda m: replacements[m.group()],
string
)
Expected output:
my ,name,is John,where23 are+,_you? ,MeinName
You can use
import re
s = "Mein ,Name, ist John, Wo23 bist+ ,_du? , MeinName "
replacements = { "Mein": "my", "ist": "is", "Wo":"where", "bist":"are", "du":"you", "is" :"iis"}
rx = r'(?:{})(?=[,/!?()_0-9\-=+."\s\'])'.format('|'.join(map(re.escape, replacements.keys())))
print (rx)
print ( re.sub(rx, lambda m: replacements[m.group()], s) )
# => my ,Name, is John, where23 are+ ,_you? , MeinName
See the Python demo.
The regex will look like
(?:Mein|ist|Wo|bist|du|is)(?=[,/!?()_0-9\-=+."\s\'])
See the regex demo. Details:
(?:Mein|ist|Wo|bist|du|is)
- one of the alternative strings(?=[,/!?()_0-9\-=+."\s\'])
- a positive lookahead matching a location that is immediately followed with ,
, /
, !
, ?
, )
, (
, _
, a digit, -
, =
, +
, .
, "
, whitespace and '
.