Search code examples
regexregexp-replace

Regex to append extension to file path


I'm trying to get running a GitHub action link checker that checks markdown files for URLs. I have a Jekyll site.

The link checker looks for files, rather than full URLs, as a result:

This fails - /docs/article1/

But this works - /docs/article1.md

I have the following rewrite in place that works for most use cases:

{
  "pattern": "(\\S+)\/(\\s|$)",
  "replacement": "$1$2"
}

However, it does not work if the trailing slash is missing from the the link.

Can someone recommend a regex update that will capture either:

  • /docs/article1/
  • /docs/article1

And rewrite it as /docs/article1.md?


Solution

  • If the string you're checking will be always

    /<name>/<name>[/]

    then (\S+?)\/(\S+?)\/?$ with substitution $1/$2.md should work. (regex101 link)

    Regex explanation

    (     // group 1
    \\S+?    // any character, more than 1, lazy (until there is '/')
    )     // close group 1
    \/    // the '/' between the names
    (     // group 2
    \\S+?    // any character, more than 1, lazy (until there is '/' or $(end of line))
    )     // close group 2
    \/?   // ignore if there is closing '/'
    $     // the end of line
    

    (Here is a snippet showing that it works)

    let reg = {
      "pattern": "(\\S+?)\/(\\S+?)\/?$",
      "replacement": "$1/$2.md"
    };
    
    let tests = ["/a/b","/a/b/","/document/data/", "/document/data", "/docs/article1/", "/docs/article1"];
    
    // run RegExp replace on every one from tests
    tests.forEach(e => {
      let replaced =
      e.replace(new RegExp(reg.pattern), reg.replacement)
       console.log(e, "=>", replaced)
    })

    From here:

    Appending the ? character to a quantifier makes it lazy; it causes the regular expression engine to match as few occurrences as possible

    And from here:

    By default quantifiers like * and + are "greedy", meaning that they try to match as much of the string as possible. The ? character after the quantifier makes the quantifier "non-greedy": meaning that it will stop as soon as it finds a match.