Search code examples
c#regexbalancing-groups

using regular expression balancing groups to match nesting tags


i'm trying use regular expression balancing groups to match nesting tags looks like this:

some text ...
 {list}
    nesting loop content
    {list}
        {list}
            {list}
                bala ...
            {/list}
        {/list}
    {/list}

{/list}
end

my expresson:

\{(?<NAME>.+?)\}
[^\{\}]*
    (
        (
            \{(?<NAME2>.+?)\}(?<OPEN>)
            [^\{\}]*?
        )
        (
            \{\/\<NAME2>\}(?<-OPEN>)
            [^\{\}]*?
        )
    )*
    (?(OPEN)(?!))  
\{\/\<NAME>\}

my problem:

 only last 2 level pair can match.

Solution

  • Typically, to match nested tags, you'd want something similar to:

    (?>
      \{(?<Open>\w+)\}
      |
      \{/(?<-Open>\<Open>)\}
      |
      (?(Open)[^{}]+)
      )*
    (?(Open)(?!))
    

    Working example: Regex Storm

    This way you can match nested tags of different types, which looks like what you're trying to do. For example, it would match this:

    {list}
        nesting loop content
        {world}
            {list}
                {hello}
                    bala ...
                {/hello}
            {/list}
        {/world}
    {/list}
    

    Notes:

    • I'm using (?(Open)[^{}]+) so we only match free text if it is within tags.
    • I'm using the same group for the top level and the inner levels.

    Yours was pretty close. You are basically missing one alternation between the middle groups:

    (
        \{(?<NAME2>.+?)\}(?<OPEN>)
        [^\{\}]*?
    )
    | # <---- This
    (
        \{\/\<NAME2>\}(?<-OPEN>)
        [^\{\}]*?
    )
    

    Working example

    However, you are always using the last value of $NAME2. $NAME2 is a stack, but you never pop values from it, only push. This causes a bug: it would also match this string (which is probably wrong):

    {list}             # Set $Name = "world"
        nesting loop content
        {world}             # Set $Name2 = "world"
            {world}         # Set $Name2 = "world"
                {hello}     # Set $Name2 = "hello"
                    bala ...
                {/hello}    # Match $Name2 ("hello")
            {/hello}        # Match $Name2 ("hello")
        {/hello}            # Match $Name2 ("hello")
    {/list}            # Match $Name ("list")
    

    See also: