As part of some data migration I'm converting user content from bbcode to markdown.
I'm doing the conversion in a custom MySQL 8.0.22 function, which contains the following line to convert bbcode [url]
tags to markdown:
...
SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i');
...
This performs as expected, for example:
[url=https://stackoverflow.com/]SO[/url]
converts corectly to [SO](https://stackoverflow.com/)
The problem is: some of the URLs contain spaces, which isn't valid markdown and isn't displaying correctly on my client-side.
Is it possible to modify my REGEXP_REPLACE
statement to replace spaces in the links with "%20"?
I'd like to do it all in MySQL if possible, but can do the processing on each record in C# if necessary.
For completeness, my entire bbcode to markdown function is:
CREATE DEFINER=`root`@`localhost` FUNCTION `func_bb_to_md`(bbcode MEDIUMTEXT) RETURNS mediumtext CHARSET utf8mb4
DETERMINISTIC
BEGIN
DECLARE markdown MEDIUMTEXT;
SET markdown = bbcode;
SET markdown = REGEXP_REPLACE(markdown, '\\[code\\](.*?)\\[\\/code\\]', '`$1`', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[url\\](.*?)\\[\\/url\\]', '<$1>', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[img\\](.*?)\\[\\/img\\]', '', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[yt\\](.*?)\\[\\/yt\\]', '', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[b\\](.*?)\\[\\/b\\]', '**$1**', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[i\\](.*?)\\[\\/i\\]', '*$1*', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[u\\](.*?)\\[\\/u\\]', '$1', 1, 0, 'i');
SET markdown = REPLACE(markdown, '[list]', '');
SET markdown = REPLACE(markdown, '[list=1]', '');
SET markdown = REPLACE(markdown, '[/list]', '');
SET markdown = REPLACE(markdown, '[*]', '* ');
SET markdown = REGEXP_REPLACE(markdown, '\\[color=([^\\]]+)\\](.*?)\\[\\/color\\]', '$2', 1, 0, 'i');
SET markdown = REGEXP_REPLACE(markdown, '\\[quote\\](.*?)\\[\\/quote\\]', '> $1', 1, 0, 'i');
SET markdown = REPLACE(markdown, ':)', '{{slightly_smiling_face}}');
SET markdown = REPLACE(markdown, ';)', '{{wink}}');
SET markdown = REPLACE(markdown, ':D', '{{grin}}');
SET markdown = REPLACE(markdown, ':P', '{{stuck_out_tongue}}');
SET markdown = REPLACE(markdown, ':(', '{{frowning_face}}');
SET markdown = REPLACE(markdown, ':''(', '{{cry}}');
SET markdown = REPLACE(markdown, ':.', '{{flushed}}');
SET markdown = REPLACE(markdown, ':|', '{{neutral_face}}');
SET markdown = REPLACE(markdown, ':O', '{{open_mouth}}');
SET markdown = REPLACE(markdown, ':@', '{{angry}}');
SET markdown = REPLACE(markdown, ':S', '{{confused}}');
SET markdown = REPLACE(markdown, ':$', '{{blush}}');
SET markdown = REGEXP_REPLACE(markdown, '\\{\\{(.*?)\\}\\}', ':$1:', 1, 0, 'i');
SET markdown = REPLACE(markdown, '\r\n', '\n');
SET markdown = REPLACE(markdown, '\n', '\r\n');
RETURN markdown;
END
In C#, you may use Regex.Replace
with a match evaluator to manipulate the captured texts. In MySQL REGEXP_REPLACE
, you do not have this option.
So, you can use
var markdown = "[url=https://stackoverflow.com/a b]SO[/url]";
var p = @"(?i)\[url=([^]]+)](.*?)\[/url]";
var result = Regex.Replace(markdown, p, x =>
$"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})");
Console.WriteLine(result);
See this C# demo online.
The (?i)\[url=([^]]+)](.*?)\[/url]
regex matches (in a case insensitive way) [url=
, captures any one or more chars other than ]
into Group 1, then [/url]
substring. The match is passed to the match evaluator, x
is a match object. $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})")
does all the necessary manipulation.