Search code examples
regexphpbbbbcode

RegEx expression needed for BBCode Tag


I'm converting my forum to phpBB and they use similar BBCode tag but one is different. To quote in the old forum, it was formatted like:

old: [quote=prattw]My text here.[/quote]

new: [quote="prattw"]My text here.[/quote]

I need a regex that will add the quotes around the username in the BBCode block. Many thanks!


Solution

  • Assuming that the only valid characters for usernames are those matched by \w, i.e. [a-zA-Z0-9_] (in Python), then you can replace \[quote=(\w+)\] by \[quote="\1"\].

    Caveats:

    • if your usernames are permitted to contain [] this may go horribly wrong.

      • You might be tempted to try a non-greedy any-character match like \[quote=(.+?)\], but for [quote=Clan[FROZEN] Supahkillah] this would give the match Clan[Froze.
      • The greedy version is even worse. \[quote=(.+)\] would match too much of [quote=Alice][quote=Bob].

      Remember that regexps aren't good at handling nested structures. You may want to use a parsing approach instead of a regular expression approach. (For HTML, this would be "use an XML parser" - for this example it will be "use a bbCode parser".)

    • At least in Python, the meaning of \w is locale-dependent. So for Russians, \w would match things like Главное в новостях too.

    I still think you're trying to reinvent the wheel. Chances are that a script exists to automatically convert your old forum format to phpBB - you just have to look for it.