How i can delete specific duplicated characters from a string only if they goes one after one in Python? For example:
A have string
string = "Hello _my name is __Alex"
I need to delete duplicate _ only if they goes one after one __ and get string like this:
string = "Hello _my name is _Alex"
If i use set i got this:
string = "_yoiHAemnasxl"
(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...)
I assume you have a string that represents all the characters you want to de-duplicate. Let's call it to_remove
, and say that it's equal to "_.-". So only underscores, periods, and hyphens will be de-duplicated.
You could use a regex to match multiple successive repeats of a character, and replace them with a single character.
>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'
Quick breakdown:
?P<char>
assigns the symbolic name char
to the first group.to_remove
inside the character matching set, []
. It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise.(?P=char)
refers back to the character matched by the named group "char".+
matches one or more repetitions of that character.So in aggregate, this means "match any character from to_remove
that appears more than once in a row". The second argument to sub
, r"\1"
, then replaces that match with the first group, which is only one character long.
Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.
>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'
Alternative approach #2: use groupby
to identify consecutive identical character groups, then join the values together, using to_remove
membership testing to decide how many values should be added..
>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'
Alternative approach #3: call re.sub
once for each member of to_remove. A bit expensive if to_remove
contains a lot of characters.
>>> for c in to_remove:
... s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'