Search code examples
regexmod-rewrite

mod_rewrite: replace underscores with dashes


I'm revealing my embarrassing ignorance of REGEX-fu here, but: I currently have a website where a load of the articles' URLs are written as "article_name", whilst the newer ones are written as "article-name".

I want to move all of them to using dashes, so is there a regular expression I could use to rewrite the older URLs to their newer equivalents?

Thanking you in advance!


Solution

  • First you must achieve consistency in the existing URLs. Basically, you have to normalize all existing names to always use dashes. Ok, you've done that.

    We're starting with the following assumption:

    The URL is roughly of the form:

    http://example.com/articles/what-ever/really-doesnt_matter/faulty_article_name
    

    where only URLs under /articles should be rewritten, and only the /faulty_article_name part needs to be sanitized.

    Greatly updated, with something that actually works

    For Apache:

    RewriteEngine     On
    RewriteRule       ^(/?articles/.*/[^/]*?)_([^/]*?_[^/]*)$ $1-$2 [N]
    RewriteRule       ^(/?articles/.*/[^/]*?)_([^/_]*)$       $1-$2 [R=301]
    

    That's generally inspired by GApple's answer.

    The first /? ensures that this code will run on both vhost confs and .htaccess files. The latter does not expect a leading slash.

    I then add the articles/ part to ensure that the rules only apply for URLs within /articles.

    Then, while we have at least two underscores in the URL, we keep looping through the rules. When we end up with only one remaining underscore, the second rule kicks in, replaces it with a dash, and does a permanent redirect.

    Phew.