Search code examples
phpregexregex-recursionpcre2

Regex - Recursion - nested matches with multiple ending


this is my first question on stackoverflow so please bare with me here. Also I am not a native english speaker.

(16.02.2022) ANSWER (https://regex101.com/r/4FRznK/1 from Comment on Answer). Special thanks to Casimir et Hippolyte for your help! I wish I could reach out to you.

\[if \s+ (?<cond> [^]]* ) ]

(?<content> [^[]*+ 
        (?: (?R) [^[]*
          | \[ (?! /if] | else (?:if)?  \b) [^[]*
        )*+
)
(?<rest> 
    (?: \[elseif \s+ [^]]* ] \g<content> )*+
    (?: \[else] \g<content> )?+
    \[/if]
)

(15.02.2022) UPDATE: I fiddled around with the solutions presented below and have gotten farther with it. Seems like there is a limit on string length to comfortably match without any catastophic backtracking.

I have updated my Regex101 to show the recent progress. Maybe one of you has an idea on how to tackle this. https://regex101.com/r/wYzA3e/4

SIDE NOTE: I do have a working function for this, but my goal is to find a faster solution in terms of optimization and reliability. My current functions takes (in my opinion) way to long to complete the task and relies havily on strpos to get me there. I do not actually want to use third-party functions if there is a cheaper (in terms of performance) solution with PHPs internal functions. So even if you advice me to use alternative approaches, please be kind and provide hints on which explicitly you mean by these. Thank you!

(14.02.2022) Original: I am presented with the following difficulties with my regular expression: This is the string ("true" and/or "false" are not actually in the string but it helps with simplification):

**[if true]**
    [if true]
        [if false]
        [else]
        [/if]
    *[elseif false]*
        [if true]
        [/if]
    [else]
    [/if]
**[elseif false]**
[/if]
**[if false]**
    [if false]
    [else]
    *[/if]*
**[else]**
    [if true]
    [/if]
[/if]

I marked the wanted matches (**) and the ones i got (*)

In this situation I do only want to match the most outer parent [if XXXX].([else]|[elseif XXX]|[/if]) statement with its according end which can be [else], [elseif XXX] or [/if]. For now i do not care about the inner [if XXX] since when the parent is false i dont need to check for them.

When running my regex:

/\[if (.*?)\](((?R)|.)*?)(\[\/if\]|\[else\]|\[elseif )/gs 

it matches the parents [if XXX] and an incoherent combination of any [elseif XX], [else], [/if] in it.

As groups I do need the match > every X [if XXX] > the content between [if XXX] and the matching [END] as well as the [END].

Since i do not fully understand Recursion I´d appreciate your help. Many thanks in advance!

You can try the regex here (updated): https://regex101.com/r/wYzA3e/4


Solution

  • When a pattern starts to be a little complicated, it's possible to use two features:

    • the verbose mode (x modifier)
    • references to subpatterns or better references to named subpatterns ( \g<name> )

    Often with these two features things become more clear and the pattern is easier to build:

    ~
    \[if \s+ [^]]* ]
    
    (?<content> [^[]*+ (?: (?R) [^[]* )*+ )
    (?: \[elseif \s+ [^]]* ] \g<content> )*+
    (?: \[else] \g<content> )?+
    
    \[/if]
    ~x
    

    demo

    Note that (?R) is nothing more that a reference to a subpattern except that this time the subpattern is the whole pattern.