I would like to find a replace repeating words in the string, but only if the are next to each other or separated by a space. For example:
"<number> <number>" -> "<number>"
"<number><number>"-> "<number>"
but not
"<number> test <number>" -> "<number> test <number>"
I have tried this:
import re
re.sub(f"(.+)(?=\<number>+)","", label).strip()
but it would give the wrong result for the last test option.
Could you please help me with that?
You can use
re.sub(r"(<number>)(?:\s*<number>)+",r"\1", label).strip()\
See the regex demo. Details:
(<number>)
- Group 1: a <number>
string(?:\s*<number>)+
- one or more occurrences of the following sequence of patterns:
\s*
- zero or more whitespaces<number>
- a <number>
stringThe \1
is the replacement backreference to the Group 1 value.
import re
text = '"<number> <number>", "<number><number>", not "<number> test <number>"'
print( re.sub(r"(<number>)(?:\s*<number>)+", r'\1', text) )
# => "<number>", "<number>", not "<number> test <number>"