Search code examples
regexregex-negationregex-look-ahead

Match string between a specific word and paired brackets after it with a single nested level support with an exception


I've a problem with a regex match. I need to find a specific substring in a string. Some examples:

1. IF[A != B; C[0]; D] ==> IF[A != B; C[0]; D]
2. IF[A != B; IF[E < F; ...; ...]; D] ==> IF[E < F; ...; ...]
3. IF[A != B; C; D] ==> IF[A != B; C; D]

So, I have this regula expression: IF\[([^\[\]]*)\]. It work fine in case 2 and 3, but in case 1 there is C[0] that contains square brackets.

I tried to change my regex in this way: IF\[((?!IF))\] and finaly IF\[(.+(?!IF))\]. I added a look ahead to say it "keep the IF that does not contains another IF". Now it works in case 1 and 3 but case 2 returns entire string.

How can I create a correct look head to solve this problem? I need to find the most internal IF in the string that can be the entire string.

I alredy tried with solution in this answer: https://stackoverflow.com/a/32747960/5731129


Solution

  • You want to match IF[...] substrings where the string between square brackets may contain another pair of square brackets unless preceded with an IF, with just a single nested bracket level.

    For that, you may use

    IF\[([^][]*(?:(?<!\bIF)\[[^][]*][^][]*)*)]
    

    See the regex demo

    Details

    • IF\[ - an IF[ substring
    • ([^][]*(?:(?<!\bIF)\[[^][]*][^][]*)*) - Group 1:
      • [^][]* - 0+ chars other than [ and ]
      • (?:(?<!\bIF)\[[^][]*][^][]*)* - 0 or more occurrences of
        • (?<!\bIF)\[ - a [ char that is not immediately preceded with a whole word IF (\b is a word boundary)
        • [^][]* - 0+ chars other than [ and ]
        • ] - a ] char
        • [^][]* - 0+ chars other than [ and ]
    • ] - a ] char.