Search code examples
.netregexwikipediabalancing-groups

How can I retrieve the longest matches for substrings enclosed by "{{" and "}}"?


I am trying to parse a wikitext file received through Wikipedia's API and the problem is that some of its templates (i.e. snippets enclosed in {{ and }}) are not automatically expanded into wikitext, so I have to manually look for them in the article source and replace them eventually. The question is, can I use regex in .NET to get the matches from the text ?

To try to make myself more clear, here is an example to illustrate what I mean:

For the string

{{ abc {{...}} def {{.....}} gh }}

there should be a single match, namely the entire string, so the longest possible match.

On the other hand, for "orphaned" braces such as in this example:

{{ abc {{...}}

the result should be a single match: {{...}}

Could anyone offer me a suggestion ? Thanks in advance.


Solution

  • Don't do it with regex. Go through the string left to right and if you encounter a {{ push its position on a stack, and on a }} pop the position of the previous {{ from the stack and calculate the length. Then you can easily take the maximum of these length.