I'm trying to perform a number of replacements using re.sub()
, except I want the first replacement to be different. One straightforward approach would be to run re.sub()
twice with count = 1
for the first call, but because re.sub()
allows for the repl
argument to be a function, we can do this in a single call:
import re
def repl(matchobj):
global first_sub
if first_sub:
first_sub = False
print(f"Replacing '{matchobj.group()}' at {matchobj.start()} with ':)'")
return ":)"
else:
print(f"Deleting '{matchobj.group()}' at {matchobj.start()}")
return ""
text = "hello123 world456"
first_sub = True
text = re.sub(r"\d+", repl, text)
# Output:
# Replacing '123' at 5 with ':)'
# Deleting '456' at 14
Unfortunately, this makes use of global
, which isn't great. Is there a better way to do this?
With an iterator, inspired by Andrej:
import re
text = "hello123 world456"
text = re.sub(
r"\d+",
lambda _, i=iter([":)"]): next(i, ""),
text
)
print(text)
Or using a dict for the state:
import re
text = "hello123 world456"
text = re.sub(
r"\d+",
lambda m, d={0: ":)"}: d.pop(0, ""),
text
)
print(text)
Or one like yours but with a closure:
import re
def repl():
first_sub = True
def repl(matchobj):
nonlocal first_sub
if first_sub:
first_sub = False
print(f"Replacing '{matchobj.group()}' at {matchobj.start()} with ':)'")
return ":)"
else:
print(f"Deleting '{matchobj.group()}' at {matchobj.start()}")
return ""
return repl
text = "hello123 world456"
text = re.sub(r"\d+", repl(), text)
print(text)