Search code examples
pythonpython-3.xregexreplaceregex-group

Replace a string by another if it is found after a pattern and before another


import re

input_text = "Creo que ((PERS)los viejos gabinetes) estan en desuso, hay que hacer algo con ellos. ellos quedaron en el deposito de afuera, lloviznó temprano por lo que ((PERS)los viejos gabinetes) fueron llevados a la sala principal."

pattern_01 = r"((PERS)\s*los\s[\w\s]+)(\.)"
output = re.sub(pattern_01, r"\1, \1\3", input_text, flags = re.IGNORECASE)

print(output)

Replace any "ellos" substrings before the first dot . after a ((PERS)\s*los ) sequence with the content inside those brackets ((PERS)\s*los ) which must be found before that occurrence of that substring "ellos"

Using this code directly does not modify the string

But I would need to get this output:

"Creo que ((PERS)los viejos gabinetes) estan en desuso, hay que hacer algo con los viejos gabinetes. ellos quedaron en el deposito de afuera, lloviznó temprano por lo que ((PERS)los viejos gabinetes) fueron llevados a la sala principal."

the number of times the replacement must be performed is not known, that is, there may be more than one "ellos" between ((PERS)ellos ) and the first point . after this word


Solution

  • You could try the following:

    import re
    
    re_block = re.compile(
        r"""
          (                  # 1. group recorded because it's kept
            \(
              \(PERS\)\s*
              ( los[^\)]* )  # Recplacment string in 2. group
            \)
          )
          ( [^\.]* )         # 3. group: part in which `ellos` gets replaced
        """,
        re.VERBOSE
    )
    re_ellos = re.compile(r"\bellos\b")
    
    def repl(match):
        return match[1] + re_ellos.sub(match[2], match[3])
    
    output_text = re_block.sub(repl, input_text)