TL;DR: How to get re.sub
to print out what substitutions it makes, including when using groups?
Kind of like having a verbose option, is it possible to have re.sub
print out a message every time it makes a replacement? This would be very helpful for testing how multiple lines of re.sub
is interacting with large texts.
I've managed to come up with this workaround for simple replacements utilizing the fact that the repl
argument can be a function:
import re
def replacer(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"[A-Z]+", lambda m: repl(m, "CAPS"), text)
text = re.sub(r"\d+", lambda m: repl(m, "NUMBER"), text)
return text
replacer("this is a 123 TEST 456", True)
# Log:
# Replacing TEST with CAPS...
# Replacing 123 with NUMBER...
# Replacing 456 with NUMBER...
However, this doesn't work for groups--it seems re.sub
automatically escapes the return value of repl
:
def replacer2(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
return text
replacer2("ABC123", verbose=True) # returns r"\2\1"
# Log:
# Replacing ABC123 with \2\1...
Of course, a more sophisticated repl
function can be written that actually checks for groups in replacement
, but at that point that solution seems too complicated for the goal of just getting re.sub
to report out on substitutions. Another potential solution would be to just use re.search
, report out on that, then use re.sub
to make the replacement, potentially using the Pattern.sub
variant in order to specify pos
and endpos
to save the sub
function from having to search the whole string again. Surely there's a better way than either of these options?
Use matchobj.expand(replacement)
which will process the replacement string and make the substitutions:
import re
def replacer2(text, verbose=False):
def repl(matchobj, replacement):
result = matchobj.expand(replacement)
if verbose:
print(f"Replacing {matchobj.group()} with {result}...")
return result
text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
return text
print(replacer2("ABC123", verbose=True)
Output:
Replacing ABC123 with 123ABC...
123ABC
A generic example that extends re.sub
with a verbose option and allows group patterns to be used by replacement functions:
import re
def sub2(pattern, repl, string, count=0, flags=0, verbose=False):
def helper(match, repl):
result = match.expand(repl(match) if callable(repl) else repl)
if verbose:
print(f'offset {match.start()}: {match.group()!r} -> {result!r}')
return result
return re.sub(pattern, lambda m: helper(m, repl), string, count, flags)
# replace three digits with their reverse
print(sub2(r'(\d)(\d)(\d)', r'\3\2\1', 'abc123def45ghi789', verbose=True))
# replace three digits with their reverse, and two digits wrap with parentheses
print(sub2(r'(\d)(\d)(\d)?',
lambda m: r'(\1\2)' if m.group(3) is None else r'\3\2\1',
'abc123def45ghi789', verbose=True))
Output:
offset 3: '123' -> '321'
offset 14: '789' -> '987'
abc321def45ghi987
offset 3: '123' -> '321'
offset 9: '45' -> '(45)'
offset 14: '789' -> '987'
abc321def(45)ghi987