I'm trying to convert text Links (with a FQDN i.e. no relative links) in markdown text to Markdown links. It is working fine except when the source markdown has already converted the text to links. For example this is the source text:
Login in to My site [https://example.com/](https://example.com/) and select Something > Select below details further.
(https://example.com/abc/1.html)
Also have a look at https://example.com/abc/1.html
My regex: /(?<!\]\()(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim
.
Expected: match only the second and third link. Current outcome: matches 3 URLs.
I tried adding a negative lookahead at the end, similar to the negative lookbehind at the beginning but that just omits the last character of the URL which is a bummer!
I'm using this in NodeJS.
Here's a link to the regex101 with the sample data
You can use a pattern to match what you do not want, and capture what you do want in group 1.
You can make use of the callback function of replace in the replacement.
You can check id group 1 exists. If it does, replace with you custom replacement. If it does not exist, replace with the full match
\[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|((?:https?|ftp):\/\/\S+)
In parts the pattern matches:
\[
Match[
(?:https?|ftp):\/\/
Match one of the protocols and ://
[^\]\[]+
Match 1+ times any char except [
and ]
\]
Match ]
\([^()]*\)
Match from (
till )
|
Or((?:https?|ftp):\/\/\S+)
Capture in group 1 a url like formatTo not match parenthesis in the url:
\[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|((?:https?|ftp):\/\/[^()\s]+)
Or specifically capture a url between parenthesis:
\[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|\(((?:https?|ftp):\/\/\S+)\)|((?:https?|ftp):\/\/[^()\s]+)