Search code examples
regexrecursionsubroutine

Regex, single match - Recursion vs. Subroutine Call


A common real-world use is to match a balanced set of parentheses.

\((?>[^()]|(?R))*\) matches a single pair of parentheses with any text in between, including an unlimited number of parentheses, as long as they are all properly paired. (...)

If you want a regex that does not find any matches in a string that contains unbalanced parentheses, then you need to use a subroutine call instead of recursion. (AND THIS IS WHAT I DONT UNDERSTAND ->). If you want to find a sequence of multiple pairs of balanced parentheses as a single match, then you also need a subroutine call.

So using "(?R)" will not produce a "single match". Is it something different than a "single match", something like multiple matches in one match?

Source: https://www.regular-expressions.info/recurse.html

"Matching balanced constructs" part.


Solution

  • If you want to find a sequence of multiple pairs of balanced parentheses as a single match, then you also need a subroutine call.

    I believe what this is saying that if you want to match this (.)(.) - a sequence of multiple pairs of balanced parentheses - then you need to use subroutines. I admit, an example would have helped in the tutorial.

    Why will recursion fail to return "as a single match"? This simply because of the way recursion works - it is applying the same regex to an ever decreasing string (if it's not infinitely recursing) and returning a result when it reaches a non-matching element, then proceeding further along the string. It does not match (.)(.) as a single string, because it is not defined to do so (you can do that using quantifiers: (?>\((?>[^()]|(?R))*\)){2} or more generally: (?>\((?>[^()]|(?R))*\))*).

    Here is what the pattern presented in the tutorial produces for a sequence of multiple pairs of balanced parentheses (line 1) and for balanced constructs or nested constructs (line 2):

    enter image description here