Search code examples
apache.htaccesshttp-status-code-404custom-error-pages

.htaccess Ban bot only on url with params


Google is visiting the page with parameters, i need to block it.

Give page 404 on all pages with param Look like site.com?q=text or site.com/?q=text

but not block if link just site.com

I wrote this script for .htaccess

ErrorDocument 403 "Your connection was rejected"
ErrorDocument 404 /404.shtml


RewriteEngine On
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{REQUEST_URI} ^/q= [NC]
RewriteRule ^ - [F,L]

But have 2 problems First - How to set params

And second - when they blocked not showing 404 page and show

Not Found
The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

But i give ErrorDocument 404 /404.shtml. Why apache not found 404.shtml? If I give a missing page , it is displayed normally 404.shtml.


Solution

  • First this is that you need to use QUERY_STRING not the REQUEST_URI to match query string.

    Moreover, you are getting this error because query string is not getting in the redirected URL i.e. /404.shtml?q=text after 404 redirect and you rule will again try to redirect to same URL.

    Ideally you should be returning 403 forbidden like this:

    RewriteEngine On
    
    RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
    RewriteCond %{QUERY_STRING} ^q= [NC]
    RewriteRule ^ - [F]
    

    However if you have to use 404 only then use it like this:

    RewriteEngine On
    
    RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
    RewriteCond %{QUERY_STRING} ^q= [NC]
    RewriteRule !^404\.shtml$ - [R=404,NC,L]
    

    Which will execute this rule for all URLs except for /404.shtml.

    You may also check for REDIRECT_STATUS like this:

    RewriteEngine On
    
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
    RewriteCond %{QUERY_STRING} ^q= [NC]
    RewriteRule ^ - [R=404,L]
    

    This will execute this rule for original URL only.