Search code examples
phpregexbbcode

Remove nested bbcode style tags and anything inside them


I need help with a regex to remove some thing. I can't get it to work as I want.

Lets say I have this text:

[quote=test]
[quote=test]for sure[/quote]
Test
[/quote]

[this should not be removed]
Dont remove me

How can I remove everything above [this should not be removed]? Please note that Test can be anything.

So I want to remove anything inside:

[quote=*][/quote]

I've come this far:

preg_replace('#\[quote=(.+)](.+)\[/quote]#Usi', '', $message);

But it keeps: Test [/quote]


Solution

  • Matching nested bbcode style code is rather complex - usually involving a non-regular expression based string parser.

    Seems you are using PHP it does support the regular expression (?R) syntax for "recursion" using this we can support nested bbcode like this.

    Note that non-matching opening [quote=*] and closing [/quote] pairs will not be matched.

    Regular Expression

    \[(quote)=[^]]+\](?>(?R)|.)*?\[/quote]
    

    https://regex101.com/r/xF3oR6/1

    Code

    $result = preg_replace('%\[(quote)=[^]]+\](?>(?R)|.)*?\[/quote]%si', '', $subject);
    

    Human Readable

    # \[(quote)=[^]]+\](?>(?R)|.)*?\[/quote]
    # 
    # Options: Case insensitive; Exact spacing; Dot matches line breaks; ^$ don’t match at line breaks; Greedy quantifiers; Regex syntax only
    # 
    # Match the character “[” literally «\[»
    # Match the regex below and capture its match into backreference number 1 «(quote)»
    #    Match the character string “quote” literally (case insensitive) «quote»
    # Match the character “=” literally «=»
    # Match any character that is NOT a “]” «[^]]+»
    #    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    # Match the character “]” literally «\]»
    # Match the regular expression below; do not try further permutations of this group if the overall regex fails (atomic group) «(?>(?R)|.)*?»
    #    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
    #    Match this alternative (attempting the next alternative only if this one fails) «(?R)»
    #       Match the entire regular expression (recursion; restore capturing groups upon exit; do not try further permutations of the recursion if the overall regex fails) «(?R)»
    #    Or match this alternative (the entire group fails if this one fails to match) «.»
    #       Match any single character «.»
    # Match the character “[” literally «\[»
    # Match the character string “/quote]” literally (case insensitive) «/quote]»