I have this text and I'm trying to remove all the inner quotes, just keeping one quoting level. The text inside a quote contains any characters, even line feeds, etc. Is this possible using a regex or I have to write a little parser?
[quote=foo]I really like the movie. [quote=bar]World
War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
Here the text I want:
[quote=foo]I really like the movie. It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
This is the regex I'm using in PHP:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\](.*)\[/quote\]%si
I tried also this variant, but it doesn't match .
or ,
and I can't figure what else I can find inside a quote:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\]([\w\s]+)\[/quote\]%i
The problem is located here:
(.*)
You can use this:
$result = preg_replace('~\G(?!\A)(?>(\[quote\b[^]]*](?>[^[]+|\[(?!/?quote)|(?1))*\[/quote])|(?<!\[)(?>[^[]+|\[(?!/?quote))+\K)|\[quote\b[^]]*]\K~', '', $text);
details:
\G(?!\A) # contiguous to a precedent match
(?> ## content inside "quote" tags at level 0
( ## nested "quote" tags (group 1)
\[quote\b[^]]*]
(?> ## content inside "quote" tags at any level
[^[]+
| # OR
\[(?!/?quote)
| # OR
(?1) # repeat the capture group 1 (recursive)
)*
\[/quote]
)
|
(?<!\[) # not preceded by an opening square bracket
(?> ## content that is not a quote tag
[^[]+ # all that is not a [
| # OR
\[(?!/?quote) # a [ not followed by "quote" or "/quote"
)+\K # repeat 1 or more and reset the match
)
| # OR
\[quote\b[^]]*]\K # "quote" tag at level 0