python regex string replace capturing-group

Python re.sub() is not replacing every match

I'm using Python 3 and I have two strings: abbcabb and abca. I want to remove every double occurrence of a single character. For example:

abbcabb should give c and abca should give bc.

I've tried the following regex (here):

(.)(.*?)\1

But, it gives wrong output for first string. Also, when I tried another one (here):

(.)(.*?)*?\1

But, this one again gives wrong output. What's going wrong here?

The python code is a print statement:

print(re.sub(r'(.)(.*?)\1', '\g<2>', s)) # s is the string

Solution

The site explains it well, hover and use the explanation section.

(.)(.*?)\1 Does not remove or match every double occurance. It matches 1 character, followed by anything in the middle sandwiched till that same character is encountered again.

so, for abbcabb the "sandwiched" portion should be bbc between two a

EDIT: You can try something like this instead without regexes:

string = "abbcabb"
result = []
for i in string:
    if i not in result:
        result.append(i)
    else:
        result.remove(i)
print(''.join(result))

Note that this produces the "last" odd occurrence of a string and not first.

For "first" known occurance, you should use a counter as suggested in this answer . Just change the condition to check for odd counts. pseudo code(count[letter] %2 == 1)