Search code examples
htmlseosearch-enginebotsweb-crawler

Disallow to open and crawl HTML files


I have HTML-files in root and also in separate directory that are included via SSI into other pages.

  1. How to disallow opening this HTML-files by direct URLs but still leave them accessible for SSI in same moment? .htaccess or something else? Is it possible in general?

  2. How to disallow crawling this HTML-files for search engine bots? If I have them included in SSI on other pages but don't have any direct links to them on site will search engine bots see them?


Solution

  • Create a robots.txt and add the following:

    User-agent: * 
    Disallow: /foldername-you-want-to-disallow/ # hides all files in this directory from bots
    Disallow: /hidden.html # hides a specific file in the root dir from bots
    Disallow: /foldername/hidden.html # hides a specific file in a subdir from bots
    

    OR

    You could create an .htaccess file and upload it into the directory you want to hide. Include the following:

    Options -Indexes
    
    Order deny,allow
    Deny from all
    

    You will still be able to call them via SSI but any http direct requests will be foiled.