Search code examples
.htaccessmod-rewrite

.htaccess mod_rewrite - How to redirect both: "/page.html" and "/page/" to: "/page"?


I need some help with my .htaccess file. I've been messing with it for about a week but cannot get it to work.. I've scoured the web for solutions but nothing really addresses my case.

In general, typing: website.com/info.html?page=12 should show: website.com/info/12 and the other way around: website.com/info/12 should serve: website.com/info.html?page=12.

Analogically, without the QUERY_STRING part: website.com/info.html should show: website.com/info and website.com/info should serve: website.com/info.html.

However! website.com/info/ (only when followed by nothing) should redirect to: website.com/info.

How can I achieve this? It should be simple but I cannot get my head around it yet.. Any tips greatly appreciated!

UPDATE:

Thanks for the answers and comments. It certainly shed some light on the subject. I'm almost there...

For starters, if anyone has trouble understanding regex these tools helped me immensely:

https://regexr.com

https://regex101.com

And a similar one for testing .htaccess files:

https://htaccess.madewithlove.com

Some clarification:

I need general .html hiding for all the pages but only one specific redirection from /info/ (there's no dir with this name) to /info.

I read ?page=12 parameter with JS window.location.search but I don't think it matters since I can replicate the error on a page without it.

Now, my .htaccess file looks like that:

RewriteEngine on

# Externally redirect: "/info.html?page=YYY" to: "/info/YYY" - WORKS
RewriteCond %{QUERY_STRING} page=(\d+) [NC]
RewriteRule ^info\.html$ /info/%1? [NC,R=302,NE,L]

# Externally redirect: "/XXX.html" to: "/XXX" - WORKS
RewriteCond %{THE_REQUEST} /(.+)\.html [NC]
RewriteRule ^ /%1 [NC,R=302,NE,L]

# Externally redirect: "/info/" to: "/info" - WORKS
RewriteRule ^info/$ /info [NC,R=302,L]

# Internally rewrite: "/info/YYY" to: "/info.html?page=YYY" - INTERNAL SERVER ERROR 
RewriteRule ^info/(\d+) /info.html?page=$1 [L]

# Internally rewrite: "/XXX" to: "/XXX.html" - WORKS
RewriteCond %{REQUEST_FILENAME}.html -f [NC]
RewriteRule ^ %{REQUEST_URI}.html [NC,L]

Just as stated in the comments, all work except the 4th one.

However, if I remove the 1st one, the 4th works.

What am I missing??


Solution

  • # Internally rewrite: "/info/YYY" to: "/info.html?page=YYY" - INTERNAL SERVER ERROR 
    RewriteRule ^info/(\d+) /info.html?page=$1 [L]
    

    I read ?page=12 parameter with JS window.location.search but I don't think it matters

    Yes, it matters. You can't internally rewrite (on the server) to append a query string and expect client-side JS to be able to see that query string. The "internal rewrite" is entirely internal to the server. JS only sees the response sent back from the server, which does not include a query string (unless it was present on the initial request or you externally redirected to it).

    Instead, you can internally rewrite a request for /info/12 to /info.html only (no query string) and use window.location.pathname to parse the URL-path (ie. /info/12) to extract the page from the second path segment (if any).

    With regards to your "error". The first rule redirects the newly rewritten URL during the second pass by the rewrite engine, which would result in a redirect loop. The last rule could also result in an internal rewrite loop (internal server error) under certain conditions (since the condition using REQUEST_FILENAME is not testing the same thing as REQUEST_URI). You also need to ensure that MultiViews is disabled in this scenario, as this will otherwise conflict with your rewrites.

    For some reason, you've resorted to using an empty query string (trailing ?) to discard the query string in the first rule. If you are on Apache 2.4 (as I would assume you are) then you should be using the QSD (Query String Discard) flag instead, as used in @arkascha's answer. Although your erroneous use of the NC flag on the condition that checks against -f makes me doubt your Apache version.

    I assume that you are already linking to the canonical URL internally in your app, eg. /info/12 (not /info.html?page=12). And these redirects are only for SEO (if changing an existing URL structure) or users that might happen to type the old (non-canonical) URLs.

    For example, try the following instead:

    # Ensure that MultiViews is disabled
    Options -MultiViews
    
    RewriteEngine on
    
    # Externally redirect: "/info.html?page=<num>" to: "/info/<num>"
    RewriteCond %{QUERY_STRING} ^page=(\d+)$ [NC]
    RewriteRule ^info\.html$ /info/%1 [NC,QSD,R=302,L]
    
    # Externally redirect: "/XXX.html" to: "/XXX"
    RewriteRule ^(.+)\.html$ /$1 [NC,R=302,L]
    
    # Externally redirect: "/info/" to: "/info"
    RewriteRule ^info/$ /info [NC,R=302,L]
    
    # Internally rewrite: "/info/<num>" to: "/info.html"
    RewriteRule ^info/\d+$ info.html [END]
    
    # Internally rewrite: "/XXX" to: "/XXX.html" if it exists
    RewriteCond %{DOCUMENT_ROOT}/$1.html -f
    RewriteRule (.+) $1.html [END]
    

    I've used END (requires Apache 2.4) on the last two rules to prevent a second loop by the rewrite engine, which would otherwise result in a redirect loop.

    Also note the addition of the end-of-string anchor ($) on the original regex ^info/(\d+), otherwise you are allowing URLs of the form /info/12<anything>.

    I've removed the slash prefix on the substitution strings of the two rewrites as we are rewriting to a file-path, not a URL-path. By rewriting to a URL-path (slash prefix) you are (unnecessarily) forcing the rewrite engine to remap the URL to the filesystem.

    You have to be careful when checking against THE_REQUEST as this also includes the query string (if any), which you would need to avoid when checking for .html.

    And, as mentioned, you will need to use window.location.pathname in your client-side JS instead to parse the page number (if any) from the requested URL.

    Note that this can be further optimised/simplified if we know more about your system/URL structure. For instance, are you only dealing with URLs in the document root, as in your example, or are you also expecting /foo/bar/baz as well? Do your URLs contain dots? (That would otherwise delimit file extensions on your static resources.)