Search code examples
pythonregex

Python Regex to match the FIRST repetition of a digit


Examples:

  1. For 0123123123, 1 should be matched since the 2nd 1 appears before the repetition of any other digit.
  2. For 01234554321, 5 should be matched since the 2nd 5 appears before the repetition of any other digit.

Some regexes that I have tried:

  1. The below works for the 1st but not the 2nd example. It matches 1 instead because 1 is the first digit that appears in the string which is subsequently repeated.
import re
m = re.search(r"(\d).*?\1", string)
print(m.group(1))
  1. The below works for the 2nd but not the 1st example. It matches 3 instead - in particular the 2nd and 3rd occurrence of the digit. I do not know why it behaves that way.
import re
m = re.search(r"(\d)(?!(\d).*?\2).*?\1", string)
print(m.group(1))

Solution

  • One idea: capture the end of the string and add it in the negative lookahead (group 2 here):

    (\d)(?=.*?\1(.*))(?!.*?(\d).*?\3.+?\2$)
    

    This way you can control where the subpattern .*?(\d).*?\3 in the negative lookahead ends. If .+?\2$ succeeds, that means there's an other digit that is repeated before the one in group 1.

    I anchored the pattern for the regex101 demo with ^.*?, but you don't need to do that with the re.search method.


    Other way: reverse the string and find the last repeated digit:

    re.search(r'^.*(\d).*?\1', string[::-1]).group(1)