I'm working on a forum system that parses BBCode like [b]some bold text[/b]
and applies HTML formatting to it when output via PHP. All of my expressions work, but I'm having trouble figuring out how to deal with a certain scenario, specifically regarding nested quote blocks.
On a forum you might have one user quote another user. I have been successful at formatting this using:
#\[quote="(.*?);(\w*?)"\]\s*(.*?)\s*\[\/quote\]#
and calling preg_replace()
to replace it with:
<blockquote id="quote-$2"><p>$3<br> - $1</p></blockquote>
Here is a working example.
For a real example you might see on a forum, a user, Stan
, wants to quote John
, adding this to a textarea for submission:
[quote="John;2"]John's sentence[/quote]
____________
Stan's reply
But what happens if John had quoted Mary in his post?
[quote="John;2"][quote="Mary;1"]Mary's sentence[/quote]John's sentence[/quote]
____________
Stan's reply
My regex will capture all but the last [/quote]
, but even if I was able to capture the whole string I'm not sure how I'd be able to format it. Ideally, I'd like the output to look something like this:
"Mary's sentence"
- Mary
"John's sentence"
- John
__________________________
Stan's reply
In HTML:
<blockquote id="quote-2">
<blockquote id="quote-1"><p>"Mary's sentence"<br> - Mary</p></blockquote>
<p>"John's sentence"<br> - John</p>
</blockquote>
<p>Stan's reply</p>
Can I capture and format repeated nested tags using regex? What if there are 100 nested quote blocks? Obviously I can just write a ridiculously long and repetitive expression (which certainly would have limitations), but there has to be a better way to tackle this. Is there another method I should use?
I'm sorry if a similar question already exists, but I have looked through many questions on SO and am still not sure which approach I should take.
The idea is to make sure you only match the innermost BB tag. Match all text between [quote
and [/quote]
that dooes not contain another [quote=
, and replace until no such match is found. It is also based on an assumption you have no [quote=
in your actual tag contents, but in most cases it is true. Another assumption is that the attributes are "
-quoted and there cannot be other double quotes inside.
So, you may use
$s = '[quote="John;2"][quote="Mary;1"]Mary\'s sentence[/quote]John\'s sentence[/quote]';
$repl = '<blockquote id="quote-$2"><p>$3 <br> - $1</p></blockquote>';
$reg = '~\[quote="([^"]*);(\w*)"]\s*((?:(?!\[quote=).)*?)\s*\[/quote]~si';
while (preg_match($reg, $s)) {
$s = preg_replace($reg, $repl, $s);
}
echo $s;
// => <blockquote id="quote-2"><p><blockquote id="quote-1"><p>Mary's sentence <br> - Mary</p></blockquote>John's sentence <br> - John</p></blockquote>
See the PHP demo. The regex is
'~\[quote="([^"]*);(\w*)"]\s*((?:(?!\[quote=).)*?)\s*\[/quote]~si'
See the regex demo.
Details
\[quote="
- a literal substring([^"]*)
- Capturing group 1: any 0+ chars other than "
;
- a colon(\w*)
- Capturing group 2: 0+ word chars"]
- a literal substring\s*
- 0+ whitespaces((?:(?!\[quote=).)*?)
- Capturing group 3: any char, as few as possible, not starting [quote=
text\s*
- 0+ whitespaces\[/quote]
- a literal [/quote]
substring.Pretty-printing is an extra task, there are a couple of solutions mentioned here.