Search code examples
javascriptregexregex-groupstring-matching

Getting links from markdown file even if link is in the link text


I've got a markdown file with some links in it. I try to grab all these links with their corresponding link text. It works fine with simple links but I can't figure out how to match a link with an image link.

If I have an image link like [![alt text](https://example.com/image.svg)](https://other-example.com), I'd like to grab both links and both link texts.

I came up with two regexes:

  • /\[([^\]!]+)]\((https:\/\/[^\)]+)\)/gi
  • /\[([^\[!]+)](\(https:\/\/[^\)]+\))/gi

let str = `# Title with Image [![alt text](https://example.com/image.svg)](https://other-example.com)

## Links

- [First](https://somesite.com/path/to-page) - voluptates explicabo debitis aspernatur dolor, qui dolores.
- [Second](https://example.io/this/is/page) - molestiae animi eius nisi quam quae quisquam beatae reiciendis.`

let regex1 = /\[([^\]!]+)]\((https:\/\/[^\)]+)\)/gi
let regex2 = /\[([^\[!]+)](\(https:\/\/[^\)]+\))/gi
let links1 = [...str.matchAll(regex1)].map((m) => ({ text: m[1], link: m[2] }))
let links2 = [...str.matchAll(regex2)].map((m) => ({ text: m[1], link: m[2] }))

console.log(links1)
console.log(links2)

The expected result would be (order doesn't matter):

[
  {
    "text": "![alt text](https://example.com/image.svg)",
    "link": "https://other-example.com"
  },
  {
    "text": "alt text",
    "link": "https://example.com/image.svg"
  },
  {
    "text": "First",
    "link": "https://somesite.com/path/to-page"
  },
  {
    "text": "Second",
    "link": "https://example.io/this/is/page"
  }
]

regex101 link


Solution

  • ([^\]!]+) blocks you from matching the ![alt text](https://example.com/image.svg), so I replaced it with (!\[.+?\]\(.+?\)|.+?) that first looks for ![alt text](https://example.com/image.svg) and then alt text as a text wrapped in [] (square braces).

    /(?=\[(!\[.+?\]\(.+?\)|.+?)]\((https:\/\/[^\)]+)\))/gi
    

    Note, for cross-matching, you should wrap the pattern into positive lookahead. Also, see the demo.

    let str = `# Title with Image [![alt text](https://example.com/image.svg)](https://other-example.com)
    
    ## Links
    
    - [First](https://somesite.com/path/to-page) - voluptates explicabo debitis aspernatur dolor, qui dolores.
    - [Second](https://example.io/this/is/page) - molestiae animi eius nisi quam quae quisquam beatae reiciendis.`
    
    let regex = /(?=\[(!\[.+?\]\(.+?\)|.+?)]\((https:\/\/[^\)]+)\))/gi
    
    let links = [...str.matchAll(regex)].map((m) => ({ text: m[1], link: m[2] }))
    
    console.log(links)