Search code examples
pythonregexcapturing-group

regex - how to match group of unique characters of certain length


I'm looking for a regex that will match ad-hoc groups of characters of certain length only if all its characters are unique.

For the given string example:

123132213231312321112122121111222333211221331

123, 132, 213, 231, 312, 321 are matched and 112, 122, 121, 111, 313, 322, 221, 323, 131, etc are not matched.

I tried (?:([0-9])(?!.{3}\1)){3} but it's completely wrong


Solution

  • Iterate over the input string, finding a match of this expression each iteration, chopping off up to and including the first character of the previous match, until there is no match:

    ((\d)((?!\2)\d)((?!\2)(?!\3)\d))
    

    You could do a findAll, but then you won't detect overlapping matches, such as "12321" would have. You'd only find the first: "123"

    Of course, this only works for digits. If you want to match word characters also, you could do:

    ((\w)((?!\2)\w)((?!\2)(?!\3)\w))
    

    If you want a longer length, just follow the pattern when building a regex:

    ((\w)((?!\2)\w)((?!\2)(?!\3)\w)((?!\2)(?!\3)(?!\4)\w))
    

    So, I'll just hopefully Python-correct code... :

    max=<your arbitrary length>
    regex = "((\\w)"
    for i in range(1, max-1):
        regex += "("
        for j in range(2, i+1):
            regex +="(?!\\"+j+")"
        regex += "\\w)"
    regex = ")"
    

    Phew