Search code examples
javascriptregexmybb

How can I match all MyCode, including nested tags, in regex?


MyCode is a tag-based formatting system for MyBB. Example formats relevant to capturing it are:

[quote]This is a quote[/quote]

[quote=Bob]This is a quote, [b]this bit is bold[/b], [quote] this is a nested quote [/quote][/quote]

[url=http://www.stackoverflow.com][color=#ff0000]This is an anchor with a red text color.[/color][/url]

[quote][b]
Tags can also span multiple lines.
[img]http://www.website.com/image.png[/img]
[/b]
[/quote]

So far, I've written regex which successfully captures most of this but fails to capture certain nested elements. I feel like I need the regex to match "inner first", but I don't know how to do this (if at all possible.)

For example in the case of: [quote]test [b]bold[/b][/quote], it needs to first check the inner [b] tags, then the [quote] tags. Same goes for: [quote][quote]nested[/quote][/quote]

Here's what I've written so far, I annotated it to help explain myself.

\[(.*?)(=[^]]+)?]([\s\S]*?)\[\/\1]

annotated picture of matching groups

https://regex101.com/r/emNAh2/1


Solution

  • While it's technically correct, strictly speaking, that you can't use regular expressions here, it's more of a wrong statement than a right one. Very few languages have a strictly regular implementation of regex, and the most popular regex library (PCRE) has no issues with this task.

    Now, doing this in regex is a terrible idea. ReDoS, readability, maintainability, etc., are each so bad with regex, that just by themselves, these issues could rule out a regex-based approach.

    But here's a regex based solution anyway: https://regex101.com/r/q0zNBU/1