import re
input_text = "Creo que ((PERS)los viejos gabinetes) estan en desuso, hay que hacer algo con ellos. ellos quedaron en el deposito de afuera, lloviznó temprano por lo que ((PERS)los viejos gabinetes) fueron llevados a la sala principal."
pattern_01 = r"((PERS)\s*los\s[\w\s]+)(\.)"
output = re.sub(pattern_01, r"\1, \1\3", input_text, flags = re.IGNORECASE)
print(output)
Replace any "ellos"
substrings before the first dot .
after a ((PERS)\s*los )
sequence with the content inside those brackets ((PERS)\s*los )
which must be found before that occurrence of that substring "ellos"
Using this code directly does not modify the string
But I would need to get this output:
"Creo que ((PERS)los viejos gabinetes) estan en desuso, hay que hacer algo con los viejos gabinetes. ellos quedaron en el deposito de afuera, lloviznó temprano por lo que ((PERS)los viejos gabinetes) fueron llevados a la sala principal."
the number of times the replacement must be performed is not known, that is, there may be more than one "ellos"
between ((PERS)ellos )
and the first point .
after this word
You could try the following:
import re
re_block = re.compile(
r"""
( # 1. group recorded because it's kept
\(
\(PERS\)\s*
( los[^\)]* ) # Recplacment string in 2. group
\)
)
( [^\.]* ) # 3. group: part in which `ellos` gets replaced
""",
re.VERBOSE
)
re_ellos = re.compile(r"\bellos\b")
def repl(match):
return match[1] + re_ellos.sub(match[2], match[3])
output_text = re_block.sub(repl, input_text)