Search code examples
pythonpython-3.xregexreplaceregex-group

How to make the re.sub() method replace a word only if this word is NOT preceded by a number \d?


import re

input_str = "esta a mil o a 25 mil 500 millas de la ciudad de milan" #Example 1
input_str = "mil o mas para pasar! y tu tienes menos de mil" #Example 2
input_str = "hace casi mil o 2mil anos aparecio un fenomeno legendario tan solo a millas de aqui" #Example 3

How do I make it do the replacement if the word "mil" is NOT preceded by a number \d?

input_str = re.sub(r"\d[\s|]*(?:mil)", " 1 mil" , input_str)

Output for this 3 examples that I need:

"esta a 1 mil o a 25 mil 500 millas de la ciudad de milan" # for Example 1

"1 mil o mas para pasar! y tu tienes menos de 1 mil" # for Example 2

"hace casi 1 mil o 2mil anos aparecio un fenomeno legendario tan solo a millas de aqui" # for Example 3

Solution

  • If I understand correctly, the regex would be:

    Meaning that you want to capture mil if the character before it is not a number, \d, or a number then a space, \d\s. I'm using two negative lookbehinds since python doesn't support a variable length lookbehind.

    enter image description here