After a site move I want to be able to remove the extension (if any) and query string (if any) to leave just the file name and keep the path
https://www.example.com/blog/anyfile.html
301 to >> https://www.example.com/blog/anyfile
https://example.com/blog/anyfile.html/amp
301 to >> https://www.example.com/blog/anyfile
https://www.example.com/blog/anyfile.html/amp?nonamp=1
301 to >> https://www.example.com/blog/anyfile
I tried something like this, but it doesn't keep the /blog/
folder:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/blog/
RewriteRule ^.*/([^/]+)\.html$ /$1? [L,NC,R]
also, I can't find a way to remove /amp
after .html
Near the top of the root .htaccess
file you could do something like the following to discard .html
and .html/amp
and .html/<anything>
from the end of the URL-path. And discard the query string (if any) at the same time:
# Strip ".html" onwards from the end of the URL (and remove query string)
RewriteRule ^(.*)\.html(/.*)?$ https://www.example.com/$1 [QSD,R=301,L]
The QSD
(Query String Discard) flag is preferable to appending an empty query string in order to remove the query string on Apache 2.4+.
You need to hardcode the scheme + hostname if you wish to satisfy your second example and redirect from example.com
to www.example.com
. This could be generalised (without hardcoding the domain) if we know that your site is only accessible by the www subdomain or domain apex and this single domain.
However, the above won't catch URLs that only include a query string, but don't contain .html
in the URL-path. For that you could implement an additional rule, following the rule above:
# Strip the query string from any URL.
RewriteCond %{QUERY_STRING} .
RewriteRule ^ https://www.example.com%{REQUEST_URI} [QSD,R=301,L]
A look at your existing rule:
RewriteCond %{REQUEST_URI} ^/blog/ RewriteRule ^.*/([^/]+)\.html$ /$1? [L,NC,R]
You are only capturing the filename (anyfile
in your example) and discarding the URL-path that precedes this (ie. blog/
). So the $1
backreference only contains anyfile
. This also only matches URLs that end in .html
and not .html/amp
.
Checking the URL-path in the RewriteCond
directive is superfluous.