I am trying to use regular expressions to replace numeric ranges in text, such as "4-5"
, with the phrase "4 to 5"
.
The text also contains dates such as "2024-12-26"
that should not be replaced (should be left as is).
The regular expression (\d+)(\-)(\d+)
(attempt one below) is clearly wrong, because it falsely matches dates.
Using a negative lookahead expression, I came up with the regex (?!\d+\-\d+\-)(\d+)(\-)(\d+)
instead (attempt two below), which correctly matches "4-5"
while rejecting "2024-12-26"
.
However, attempt_two
does not behave correctly in a re.subn()
context, because although it rejects "2024-12-26"
, the search continues on to match (and replace) the substring "12-26"
:
import re
text = """
2024-12-26
4-5
78-79
"""
attempt_one = re.compile(r"(\d+)(\-)(\d+)")
attempt_two = re.compile(r"(?!\d+\-\d+\-)(\d+)(\-)(\d+)")
print("Attempt one:")
print(re.match(attempt_one, "4-5")) # Match: OK
print(re.match(attempt_one, "2024-12-26")) # Match: False positive
new_text, _ = re.subn(attempt_one, r"\1 to \3", text) # Incorrect substitution
print(new_text)
print("Attempt two:")
print(re.match(attempt_two, "4-5")) # Match: OK
print(re.match(attempt_two, "2024-12-26")) # Doesn't match: OK
new_text, _ = re.subn(attempt_two, r"\1 to \3", text) # Still incorrect
print(new_text)
Output:
Attempt one:
<re.Match object; span=(0, 3), match='4-5'>
<re.Match object; span=(0, 7), match='2024-12'>
2024 to 12-26
4 to 5
78 to 79
Attempt two:
<re.Match object; span=(0, 3), match='4-5'>
None
2024-12 to 26
4 to 5
78 to 79
What regular expression can I use so that the substitution returns the following instead?
2024-12-26
4 to 5
78 to 79
(As my goal is to learn about regular expressions, I am not interested in workarounds such as matching the whitespace or newline after "12-26"
.)
You need both a negative lookbehind and a negative lookahead, to prohibit an extra hyphen before or after the match.
(?<![-\d])(\d+)-(\d+)(?![-\d])
The lookarounds also have to match digits, so it won't match part of the date, e.g. 024-1
from 2024-12-26
.