Search code examples
c#mysqlregexmarkdownbbcode

Need help converting bbcode URLs that contain spaces to valid markdown


As part of some data migration I'm converting user content from bbcode to markdown.

I'm doing the conversion in a custom MySQL 8.0.22 function, which contains the following line to convert bbcode [url] tags to markdown:

...
SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i');
...

This performs as expected, for example:

[url=https://stackoverflow.com/]SO[/url] converts corectly to [SO](https://stackoverflow.com/)

The problem is: some of the URLs contain spaces, which isn't valid markdown and isn't displaying correctly on my client-side.

Is it possible to modify my REGEXP_REPLACE statement to replace spaces in the links with "%20"?

I'd like to do it all in MySQL if possible, but can do the processing on each record in C# if necessary.


For completeness, my entire bbcode to markdown function is:

CREATE DEFINER=`root`@`localhost` FUNCTION `func_bb_to_md`(bbcode MEDIUMTEXT) RETURNS mediumtext CHARSET utf8mb4
    DETERMINISTIC
BEGIN
    DECLARE markdown MEDIUMTEXT;
    SET markdown = bbcode;
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[code\\](.*?)\\[\\/code\\]', '`$1`', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[url\\](.*?)\\[\\/url\\]', '<$1>', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[img\\](.*?)\\[\\/img\\]', '![]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[yt\\](.*?)\\[\\/yt\\]', '![]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[b\\](.*?)\\[\\/b\\]', '**$1**', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[i\\](.*?)\\[\\/i\\]', '*$1*', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[u\\](.*?)\\[\\/u\\]', '$1', 1, 0, 'i'); 
    
    SET markdown = REPLACE(markdown, '[list]', ''); 
    SET markdown = REPLACE(markdown, '[list=1]', ''); 
    SET markdown = REPLACE(markdown, '[/list]', ''); 
    SET markdown = REPLACE(markdown, '[*]', '* '); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[color=([^\\]]+)\\](.*?)\\[\\/color\\]', '$2', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[quote\\](.*?)\\[\\/quote\\]', '> $1', 1, 0, 'i'); 
    
    SET markdown = REPLACE(markdown, ':)', '{{slightly_smiling_face}}');
    SET markdown = REPLACE(markdown, ';)', '{{wink}}');
    SET markdown = REPLACE(markdown, ':D', '{{grin}}');
    SET markdown = REPLACE(markdown, ':P', '{{stuck_out_tongue}}');
    SET markdown = REPLACE(markdown, ':(', '{{frowning_face}}');
    SET markdown = REPLACE(markdown, ':''(', '{{cry}}');
    SET markdown = REPLACE(markdown, ':.', '{{flushed}}');
    SET markdown = REPLACE(markdown, ':|', '{{neutral_face}}');
    SET markdown = REPLACE(markdown, ':O', '{{open_mouth}}');
    SET markdown = REPLACE(markdown, ':@', '{{angry}}');
    SET markdown = REPLACE(markdown, ':S', '{{confused}}');
    SET markdown = REPLACE(markdown, ':$', '{{blush}}');
    SET markdown = REGEXP_REPLACE(markdown, '\\{\\{(.*?)\\}\\}', ':$1:', 1, 0, 'i');
    
    SET markdown = REPLACE(markdown, '\r\n', '\n');
    SET markdown = REPLACE(markdown, '\n', '\r\n');

    RETURN markdown;
END

Solution

  • In C#, you may use Regex.Replace with a match evaluator to manipulate the captured texts. In MySQL REGEXP_REPLACE, you do not have this option.

    So, you can use

    var markdown = "[url=https://stackoverflow.com/a b]SO[/url]";
    var p = @"(?i)\[url=([^]]+)](.*?)\[/url]";
    var result = Regex.Replace(markdown, p, x => 
            $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})");
    Console.WriteLine(result);
    

    See this C# demo online.

    The (?i)\[url=([^]]+)](.*?)\[/url] regex matches (in a case insensitive way) [url=, captures any one or more chars other than ] into Group 1, then [/url] substring. The match is passed to the match evaluator, x is a match object. $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})") does all the necessary manipulation.