Search code examples

Remove nested quotes

I have this text and I'm trying to remove all the inner quotes, just keeping one quoting level. The text inside a quote contains any characters, even line feeds, etc. Is this possible using a regex or I have to write a little parser?

[quote=foo]I really like the movie. [quote=bar]World 

War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

Here the text I want:

[quote=foo]I really like the movie.  It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

This is the regex I'm using in PHP:


I tried also this variant, but it doesn't match . or , and I can't figure what else I can find inside a quote:


The problem is located here:



  • You can use this:

    $result = preg_replace('~\G(?!\A)(?>(\[quote\b[^]]*](?>[^[]+|\[(?!/?quote)|(?1))*\[/quote])|(?<!\[)(?>[^[]+|\[(?!/?quote))+\K)|\[quote\b[^]]*]\K~', '', $text);


    \G(?!\A)              # contiguous to a precedent match
    (?>                   ## content inside "quote" tags at level 0
      (                    ## nested "quote" tags (group 1)
        (?>                ## content inside "quote" tags at any level
         |                  # OR
         |                  # OR
          (?1)              # repeat the capture group 1 (recursive)
      (?<!\[)           # not preceded by an opening square bracket
      (?>              ## content that is not a quote tag
        [^[]+           # all that is not a [
       |                # OR
        \[(?!/?quote)   # a [ not followed by "quote" or "/quote"
      )+\K              # repeat 1 or more and reset the match
    |                   # OR
    \[quote\b[^]]*]\K   # "quote" tag at level 0