Search code examples
security.htaccessrobots.txt

How to stop search engines from crawling the whole website?


I want to stop search engines from crawling my whole website.

I have a web application for members of a company to use. This is hosted on a web server so that the employees of the company can access it. No one else (the public) would need it or find it useful.

So I want to add another layer of security (In Theory) to try and prevent unauthorized access by totally removing access to it by all search engine bots/crawlers. Having Google index our site to make it searchable is pointless from the business perspective and just adds another way for a hacker to find the website in the first place to try and hack it.

I know in the robots.txt you can tell search engines not to crawl certain directories.

Is it possible to tell bots not to crawl the whole site without having to list all the directories not to crawl?

Is this best done with robots.txt or is it better done by .htaccess or other?


Solution

  • It is best handled with a robots.txt file, for just bots that respect the file.

    To block the whole site add this to robots.txt in the root directory of your site:

    User-agent: *
    Disallow: /
    

    To limit access to your site for everyone else, .htaccess is better, but you would need to define access rules, by IP address for example.

    Below are the .htaccess rules to restrict everyone except your people from your company IP:

    Order allow,deny
    # Enter your companies IP address here
    Allow from 255.1.1.1
    Deny from all