Search code examples
.htaccessfaviconrobots.txt

Can you rename robots.txt and favicon?


I want the following names in my server like this: (so all server setup and crawler stuff starts with a . to show up first in the list of files, and then my webpage files after in the list of files.)

.favicon.ico
.htaccess
.robots.txt
.sitemap.xml
index.php
contact.php

Here is the contents of .htaccess :

Redirect 301 "/robots.txt" "/.robots.txt"
Redirect 301 "/favicon.ico" "/.favicon.ico"

Here is the contents of file ".robots.txt":

User-Agent: *
Sitemap: http://example.com/.sitemap.xml
Allow: https://example.com/index.php
Allow: https://example.com/contact.php

Is this okay? Everything will run properly? What about the site icon? Is that setup okay? Thanks in advance!


Solution

  • Yes, you can rename robots.txt and favicon.ico, however, you should implement this as an internal rewrite, not an external redirect (which creates an unnecessary additional request). I would also consider using a different character prefix, other than a dot, which generally indicates a "hidden/protected" file (as in .htaccess) and may not display by default in FTP-clients. There might already be directives in the server config that prevent access to dot-files. (Perhaps use @ instead, which I'll use in the following examples.)

    You wouldn't normally rename the robots.txt (and favicon.ico) file for the reason mentioned (to simply affect the ordering of files in the directory). And this could be confusing for other developers. However, "renaming" the robots.txt file is quite typical when you need to conditionally serve different robots.txt files depending on elements of the request (eg. perhaps you have multiple domains being served from the same hosting account and you need different robots.txt files or perhaps you need to disallow crawling of some domains and allow others).

    To rewrite the request you need to use mod_rewrite instead. For example:

    RewriteEngine On
    
    # Rewrite from "robots.txt" to "@robots.txt"
    RewriteRule ^favicon\.ico$ @favicon.ico [L]
    RewriteRule ^robots\.txt$ @robots.txt [L]
    

    Which can be "simplified" to a single rule and avoiding repetition:

    # Rewrite from "robots.txt" to "@robots.txt"
    RewriteRule ^(robots\.txt|favicon\.ico)$ @$1 [L]
    

    $1 is a backreference to the URL-path that is captured in the preceding RewriteRule pattern, ie. robots.txt or favicon.ico, depending on the request.

    To the user-agent (browser / search-engine bot), the "internal rewrite" is invisible. It is as if the file is called robots.txt and favicon.ico respectively.

    However, consider blocking direct access to these @-files as well. ie. redirect the request from @robots.txt to robots.txt. For example, the following would need to go before the above rewrite:

    # Redirect from "@robots.txt" to "robots.txt"
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteRule ^@(robots\.txt|favicon\.ico)$ /$1 [R=301,L]
    

    The check against the REDIRECT_STATUS environment variable ensures that only direct requests from the client are processed and not internally rewritten requests by the above rewrite.

    If you do the same with sitemap.xml (ie. all @-files) then the rule can be simplified, as you only need to check for the @ prefix and don't need to explicitly match the filename. For example:

    # Redirect and remove the "@" prefix from all requests
    RewriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteRule ^@([^/]+)$ /$1 [R=301,L]
    

    Reference:


    Instead of renaming the files, move the other (content) files

    Alternatively, instead of renaming these files in the root, why not have all your content (.php files) in a subdirectory (eg. /content) and rewrite all requests for .php files in the root to that subdirectory instead? For example:

    # Internally rewrite "contact.php" to "/content/contact.php"
    RewriteCond %{DOCUMENT_ROOT}/content/$1 -f
    RewriteRule ^([^/]+\.php)$ content/$1 [L]
    

    The preceding condition (RewriteCond directive) checks that the target file in the /content subdirectory exists before rewriting the request. And then it's a simple task to remove the .php extension from your visible URLs. For example, replacing the above rule with:

    # Internally rewrite "contact" to "/content/contact.php"
    RewriteCond %{DOCUMENT_ROOT}/content/$1.php -f
    RewriteRule ^([^/]+)$ content/$1.php [L]
    

    robots.txt format

    User-Agent: *
    Sitemap: http://example.com/.sitemap.xml
    Allow: https://example.com/index.php
    Allow: https://example.com/contact.php
    

    The Allow directives are not required here (since "allow" is the default). However, the Allow (and Disallow) directives take a root-relative URl-path, not an absolute URL. ie. It should be /index.php, not https://example.com/index.php - the later would not do anything since it will never match. (You are also mixing http and https here.)