Search code examples
apache.htaccessmod-rewritehttp-redirect

When redirecting subdomains to folders, how can I avoid folder-subdomain duplication for URLs with trailing slash


I have a strange problem with internal mod_rewrte redirects on Apache 2.4.

In my .htaccess file I redirect a subdomain sub to a folder /sub with the following directives:

RewriteCond %{HTTP_HOST} ^sub.mydomain.com$ [NC]
RewriteRule ^((?!sub).*)$ /sub/$1  [NC]

Thos works perfectly for, say, https://sub.mydomain.com/articles/ - the URL stays like this in the browser's address field and as expected the data from the location at /sub/articles/index.html are served/

However when I type in in the browser https://sub.mydomain.com/articles (note the missing slash) the URL is changed in the browser to https://sub.mydomain.com/sub/articles/ (note the duplicated sub as folder and subdomain!).

I guess this is caused by Apache's default behavior to add a slash to slashless directory requests as external redirects. The adding of the slash is OK with me, but of course I want to avoid the folder-subdomain duplication. - How can I do this?


Solution

  • Yes, this is caused by mod_dir appending a slash (with a 301 redirect) to the directory after the rewrite has occurred, exposing the internally rewritten URL/directory.

    The canonical URL therefore needs to be /articles/ (with a trailing slash), not /articles. We can correct this with an external redirect before the rewrite occurs.

    (This avoids you having to disable the DirectorySlash - which would still leave you with a canonicalization / duplicate content issue.)

    For example, before the existing rewrite, test to see if the requested URL-path (that is missing a trailing slash) exists as a directory in the /sub directory and append a slash if that is the case.

    # Redirect to append trailing slash if exists as a dir inside "/sub"
    RewriteCond %{HTTP_HOST} ^sub\.mydomain\.com [NC]
    RewriteCond %{DOCUMENT_ROOT}/sub/$1 -d
    RewriteRule ^((?!sub/).*[^/])$ /$1/  [R=301,L]
    

    As an additional optimisation, you can avoid unnecessarily performing a filesystem check (which are relatively expensive) on static assets (that naturally do not end in a trailing slash) by excluding URLs that look like they have a file-extension. (This assumes you don't have physical directories that have, what looks like, a file extension, eg. /sub/somedir.xyz)

    Add the following as the 2nd condition (before the filesystem check) in the above rule:

    RewriteCond %{REQUEST_URI} !\.\w{2,4}$
    

    Aside:

    RewriteCond %{HTTP_HOST} ^sub.mydomain.com$ [NC]
    RewriteRule ^((?!sub).*)$ /sub/$1  [NC]
    

    You should probably be using the L flag on this RewriteRule directive. (And the NC flag should be unnecessary.)

    The regex ^((?!sub).*)$ excludes any URL-path that simply starts sub, which would include /subfoo and /subbar, etc. (which naturally prevents these directories from being accessible in the /sub directory). Any valid request would start /sub/ (with a trailing slash), so should be included in the negative lookahead, as I did in the rule above.

    If not already, consider also redirecting to remove /sub/ from direct requests if this directory should be exposed/discovered.