Removing specific duplicated characters from a string in Python

How i can delete specific duplicated characters from a string only if they goes one after one in Python? For example:

A have string

string = "Hello _my name is __Alex"

I need to delete duplicate _ only if they goes one after one __ and get string like this:

string = "Hello _my name is _Alex"

If i use set i got this:

string = "_yoiHAemnasxl"

Solution

(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...)

I assume you have a string that represents all the characters you want to de-duplicate. Let's call it to_remove, and say that it's equal to "_.-". So only underscores, periods, and hyphens will be de-duplicated.

You could use a regex to match multiple successive repeats of a character, and replace them with a single character.

>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'

Quick breakdown:

?P<char> assigns the symbolic name char to the first group.
we put to_remove inside the character matching set, []. It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise.
(?P=char) refers back to the character matched by the named group "char".
The + matches one or more repetitions of that character.

So in aggregate, this means "match any character from to_remove that appears more than once in a row". The second argument to sub, r"\1", then replaces that match with the first group, which is only one character long.

Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.

>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'

Alternative approach #2: use groupby to identify consecutive identical character groups, then join the values together, using to_remove membership testing to decide how many values should be added..

>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'

Alternative approach #3: call re.sub once for each member of to_remove. A bit expensive if to_remove contains a lot of characters.

>>> for c in to_remove:
...     s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'