Search code examples
pythonregexstringreplacecapturing-group

Python re.sub() is not replacing every match


I'm using Python 3 and I have two strings: abbcabb and abca. I want to remove every double occurrence of a single character. For example:

abbcabb should give c and abca should give bc.

I've tried the following regex (here):

(.)(.*?)\1

But, it gives wrong output for first string. Also, when I tried another one (here):

(.)(.*?)*?\1

But, this one again gives wrong output. What's going wrong here?


The python code is a print statement:

print(re.sub(r'(.)(.*?)\1', '\g<2>', s)) # s is the string

Solution

  • The site explains it well, hover and use the explanation section.

    (.)(.*?)\1 Does not remove or match every double occurance. It matches 1 character, followed by anything in the middle sandwiched till that same character is encountered again.

    so, for abbcabb the "sandwiched" portion should be bbc between two a

    EDIT: You can try something like this instead without regexes:

    string = "abbcabb"
    result = []
    for i in string:
        if i not in result:
            result.append(i)
        else:
            result.remove(i)
    print(''.join(result))
    

    Note that this produces the "last" odd occurrence of a string and not first.

    For "first" known occurance, you should use a counter as suggested in this answer . Just change the condition to check for odd counts. pseudo code(count[letter] %2 == 1)