Search code examples
apache.htaccessurl-rewritinglitespeed

LiteSpeed (or apache) rewrite to hide SUBFOLDERS domain.com and www.domain.com from mirror.com


I've mirrored a webpage with httrack (wget doesn't have multi connection)

Problem is this page has resources in two domains at the same time:

  1. domain.com
  2. www.domain.com

So, my scenario is root folder /var/www/mirror/ with subfolders /var/www/mirror/domain.com and /var/www/mirror/www.domain.com/

When you load the mirrored page's index in mirror.com, the url you see is https://mirror.com/domain.com/ but also you're redirected to https://mirror.com/www.domain.com/ as soon as you click in any content (see postdata at the end)

I've managed to hide one of the subfolders when you load the index in /var/www/mirror/index.html (going to mirror.com) with this code:

RewriteCond %{REQUEST_URI} !^/domain.com/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /domain.com/$1
RewriteRule ^(/)?$ domain.com/index.html [L]

But when I add the same for the second subdomain (www.):

RewriteCond %{REQUEST_URI} !^/domain.com/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /domain.com/$1
RewriteRule ^(/)?$ domain.com/index.html [L]
RewriteRule ^(.*)$ /www.domain.com/$1
RewriteRule ^(/)?$ www.domain.com/index.html [L]

the page breaks (error loading stylesheets, resources, etc).

I've tried leaving [L] only at the end, with domain.com in the last RewriteRule but also with www.domain.com in the last RewriteRule, nothing works.

PS: In the real page, only index works with domain.com, as soon as you click in any link, you're redirected forever to browse in www.domain.com, but I would love to have my mirror as domain.com always, even when my mirrored resources are linking to /var/www/mirror/www.domain.com subfolder, if that's possible.


Edit to add some examples:

  1. When I load mirror.com, I want index to be mirror.com/ and nothing else (this works ok with my first .htaccess example)
  2. When I click in any link, mirrored content send me to mirror.com/www.domain.com/someContent.html, but I want to see in browser's url mirror.com/someContent.html
  3. If I load some real subfolder, ie. mirror.com/www.domain.com/tags/someContent.html, I want to see in the url mirror.com/tags/someContent.html

Solution

  • Apache mod_rewite can not see the content of the page i.e. it can not alter URL links contained within the page. You could try use Apache mod_proxy_html which can modify URL links contained within the page. See below for further info.

    http://apache.webthing.com/mod_proxy_html/