Search code examples
c#.netregexbalancing-groups

Regex and balancing groups


I'm stuck on a Regular expression:

I have a input string with numbers and one letter that can contain more numbers and letters within the string and between parenthesis:

Just a few examples

26U(35O40) will be read as 26 and (35 or 40)
22X(34U(42O27)) will be read as 22 xor (34 and (42 or 27))
21O(24U27) will be read as 21 or (24 and 27)
20X10X15 Will be read as 20 xor 10 xor 15

I have read that this can be obtained using balancing groups, however I have tried lot of regular expressions and the closes is the following:

(?<ConditionId>\d+)(?<Operator>X|U|O)?(?<Open>\()(?<ConditionId>\d+)+(?<Operator>X|U|O)?(?<ConditionId>\d+)(?<-Open>\))

I have also thought that maybe I'm making it harder and I should just run several times the same regex, first time for everything outside the parenthesis and the second time for the inner stuff and run it again when matches the inner. Something like that:

(?<ConditionId>\d+)?(?<Operator>U|O|X)?(?<Inner>(?:\().*(?:\)))

Suggestions or help?

Thanks in advance.

Edit 1: I don't have to validate the input, just parse it.

Edit 2: So, the reason behind is to identify a condition by the condition Id and then apply the operator against the other conditions in the input String. In the same order as appear in the input String, a more general example to make it easier to understand would be logic gates:

For a given input of 20x10x15 I will have to identify the conditions by the conditionId and check if the condition is valid and apply the XOR operator on them, something like:

true X true X false = false;
false X false X true = true;
true X (false U true) = true

That is the reason I cannot group everything into a "ConditionId" group and "Operator" group.

Edit 3 This is also a valid example

(23X10)U(30O(20X19)

Solution

  • Assuming your input is already valid, and you want to parse it, here is a rather simple regex to achieve that:

    (?:
        (?<ConditionId>\d+)
        |
        (?<Operator>[XUO])
        |
        (?<Open>\()
        |
        (?<Group-Open>\))
    )+
    

    Working example - Regex Storm - switch to the table tab to see all captures.

    The pattern captures:

    • Numbers into the $ConditionId group.
    • Operators into the $Operator group.
    • Sub expressions in parentheses into the $Group group (needs a better name?). For example, for the string 22X(34U(42O27)), it will have two captures: 42O27 and 34U(42O27).

    Each capture contains the position of the matches string. The relations between $Group and its contained $Operators, $ConditionIds and sub-$Groups is expressed only using these positions.

    The (?<Group-Open>) syntax is used when we reach a closing parenthesis to capture everything since the corresponding opening parenthesis. This is explained in more detailed here: What are regular expression Balancing Groups?