Search code examples
javascriptregexmarkdownrich-text-editor

Regex for detecting url in plain form and in markdown


I am trying to capture user input in a textarea that might be a url (and similarly email) in any of the three formats -

  1. Just plain url.
  2. Markdown with title [text](url "title")
  3. Markdown without title [text](url)

Now, I have a regex (javascript) for each of the three individual formats that work by themselves. But if I want to do all 3, the first one prevents the second and third one from activating. In my code, on 'space', the regex detection is triggered. Therefore, if I have the first regex, then the one with markdown title is never triggered.

I am wondering if it is possible to have a regex for the 1st one that specifically excludes the format of the 2nd and the 3rd? Or, even better, if there is a single regex for capturing that matches all 3?

Also, since I am not that good at Regex, I'd love if someone could also explain their solution Regex, so that I could try to do the same for email detection.

Thank you!


Solution

  • Firstly, the second regex already works for the third format, so we only need to join the first and second ones.

    The simple way to do this is to use the | ("OR") character, like this:

    (<firstRegex>)|(<secondRegex>)

    Demo

    The problem with this is that it mess the capturing groups. If the regex catches the first pattern, the url will be in a different capturing group (4th on my demo) than if it was captured by the second one (2nd group).

    Excluding markdown pattern on plain URL regex

    Adding (?:^|[^\(\/]) to the beginning of the plain URL pattern will force the regex to match any character that's not a opening parenthesis, thus excluding the markdown case. The url must be extracted using a capturing group, since this character will be included in the match.

    Demo