How can I capture and format on output nested format tags?

I'm working on a forum system that parses BBCode like [b]some bold text[/b] and applies HTML formatting to it when output via PHP. All of my expressions work, but I'm having trouble figuring out how to deal with a certain scenario, specifically regarding nested quote blocks.

On a forum you might have one user quote another user. I have been successful at formatting this using:

#\[quote="(.*?);(\w*?)"\]\s*(.*?)\s*\[\/quote\]#

and calling preg_replace() to replace it with:

<blockquote id="quote-$2"><p>$3<br> - $1</p></blockquote> Here is a working example.

For a real example you might see on a forum, a user, Stan, wants to quote John, adding this to a textarea for submission:

[quote="John;2"]John's sentence[/quote] 
____________

Stan's reply

But what happens if John had quoted Mary in his post?

[quote="John;2"][quote="Mary;1"]Mary's sentence[/quote]John's sentence[/quote]
____________

Stan's reply

My regex will capture all but the last [/quote], but even if I was able to capture the whole string I'm not sure how I'd be able to format it. Ideally, I'd like the output to look something like this:

    "Mary's sentence"          
        - Mary

"John's sentence"
    - John
__________________________

Stan's reply

In HTML:

<blockquote id="quote-2">
    <blockquote id="quote-1"><p>"Mary's sentence"<br> - Mary</p></blockquote>
        <p>"John's sentence"<br> - John</p>
</blockquote> 
<p>Stan's reply</p>

Can I capture and format repeated nested tags using regex? What if there are 100 nested quote blocks? Obviously I can just write a ridiculously long and repetitive expression (which certainly would have limitations), but there has to be a better way to tackle this. Is there another method I should use?

I'm sorry if a similar question already exists, but I have looked through many questions on SO and am still not sure which approach I should take.

Solution

The idea is to make sure you only match the innermost BB tag. Match all text between [quote and [/quote] that dooes not contain another [quote=, and replace until no such match is found. It is also based on an assumption you have no [quote= in your actual tag contents, but in most cases it is true. Another assumption is that the attributes are "-quoted and there cannot be other double quotes inside.

So, you may use

$s = '[quote="John;2"][quote="Mary;1"]Mary\'s sentence[/quote]John\'s sentence[/quote]';
$repl = '<blockquote id="quote-$2"><p>$3 <br> - $1</p></blockquote>';
$reg = '~\[quote="([^"]*);(\w*)"]\s*((?:(?!\[quote=).)*?)\s*\[/quote]~si';
while (preg_match($reg, $s)) {
    $s = preg_replace($reg, $repl, $s);
}
echo $s;
// => <blockquote id="quote-2"><p><blockquote id="quote-1"><p>Mary's sentence <br> - Mary</p></blockquote>John's sentence <br> - John</p></blockquote>

See the PHP demo. The regex is

'~\[quote="([^"]*);(\w*)"]\s*((?:(?!\[quote=).)*?)\s*\[/quote]~si'

See the regex demo.

Details

\[quote=" - a literal substring
([^"]*) - Capturing group 1: any 0+ chars other than "
; - a colon
(\w*) - Capturing group 2: 0+ word chars
"] - a literal substring
\s* - 0+ whitespaces
((?:(?!\[quote=).)*?) - Capturing group 3: any char, as few as possible, not starting [quote= text
\s* - 0+ whitespaces
\[/quote] - a literal [/quote] substring.

Pretty-printing is an extra task, there are a couple of solutions mentioned here.