Search code examples
node.jsregexregex-lookarounds

Regex to convert text URLs in Markdown to Links


I'm trying to convert text Links (with a FQDN i.e. no relative links) in markdown text to Markdown links. It is working fine except when the source markdown has already converted the text to links. For example this is the source text:

Login in to My site [https://example.com/](https://example.com/) and select Something > Select below details further.
(https://example.com/abc/1.html)

Also have a look at https://example.com/abc/1.html

My regex: /(?<!\]\()(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim.

Expected: match only the second and third link. Current outcome: matches 3 URLs.

I tried adding a negative lookahead at the end, similar to the negative lookbehind at the beginning but that just omits the last character of the URL which is a bummer!

I'm using this in NodeJS.

Here's a link to the regex101 with the sample data


Solution

  • You can use a pattern to match what you do not want, and capture what you do want in group 1.

    You can make use of the callback function of replace in the replacement.

    You can check id group 1 exists. If it does, replace with you custom replacement. If it does not exist, replace with the full match

    \[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|((?:https?|ftp):\/\/\S+)
    

    In parts the pattern matches:

    • \[ Match[
    • (?:https?|ftp):\/\/ Match one of the protocols and ://
    • [^\]\[]+ Match 1+ times any char except [ and ]
    • \] Match ]
    • \([^()]*\) Match from ( till )
    • | Or
    • ((?:https?|ftp):\/\/\S+) Capture in group 1 a url like format

    Regex demo

    To not match parenthesis in the url:

    \[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|((?:https?|ftp):\/\/[^()\s]+)
    

    Regex demo

    Or specifically capture a url between parenthesis:

    \[(?:https?|ftp):\/\/[^\]\[]+\]\([^()]*\)|\(((?:https?|ftp):\/\/\S+)\)|((?:https?|ftp):\/\/[^()\s]+)
    

    Regex demo