Search code examples
phpregexpreg-matchpcre

Get matches between two delimiters when potentially nested


The specific delimiters for my case are opening and closing parenthesis. When not nested, I can get the text between them as follows:

$input = 'sometext(moretext)andmoretext(somemoretext)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');

Array
(
    [0] => moretext
    [1] => somemoretext
    [2] => andmore
)

However, when I have nested characters, I run into some snags, and get the following.

$input = 'sometext(moretext)andmoretext(somemore(with(bitof(littletext)text)more(andmore)text)text)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');

Array
(
    [0] => moretext
    [1] => somemore(with(bitof(littletext
    [2] => andmore
    [3] => andmore
)

How can I return the entire string between the delimiters:

Array
(
    [0] => moretext
    [1] => somemore(with(bitof(littletext)text)more(andmore)text)text
    [2] => andmore
)

PS. Ultimately, I will be using recursive PHP to perform the same task on any top-level matches that also contain parenthesis.


Solution

  • You may use this recursive regex pattern to match matching (...):

    preg_match_all('/\( ( (?: [^()]* | (?R) )* ) \)/x', $input, $m);
    print_r($m[1]);
    

    RegEx Demo

    (?R) recurses the entire pattern.

    Output:

    Array
    (
        [0] => moretext
        [1] => somemore(with(bitof(littletext)text)more(andmore)text)text
        [2] => andmore
    )