Search code examples
c#regexregex-lookarounds

Regex: split string into sets of chars from a pool


Is there a way to describe a regex splitting a string into sets of chars from a limited pool without duplications? For example we have pool of chars (A, B, and C), so string "AABABCCAB" gets split into "A", "AB", "ABC", and "CAB", notice that the order in sets does not matter.

Thanks!

It looks like the regex should contain positive and negative lookaheads but I didn't have any success with them(((


Solution

  • Try:

    ([ABC])(?:(?!\1)([ABC])(?:(?!\1|\2)[ABC])?)?
    

    See: regex101


    Explanation

    • ([ABC]): match A B or C and save to group 1
    • (?:...)?: then optionally
      • (?!\1)([ABC]): match another letter of ABC that is not the same as in group 1 and save this one to group 2
      • (?:...)?: and optionally match
        • (?!\1|\2)[ABC]: jet another letter ABC if it is not the one in group 1 or 2