Today I stumbled upon a folder on my web host called 'error.log'. I thought I'd take a look.
I see multiple 'file does not exist' errors - there are three types of entries:
I have some guesses about what these files are used for, but would like to know definitively:
A robots.txt
file is read by web crawlers/robots to allow/disallow it from scraping resources on your server. However, it's not mandatory for a robot to read this file, but the nice ones do. There are some further examples at http://en.wikipedia.org/wiki/Robots.txt An example file may look like and would reside in the web root directory:
User-agent: * # All robots
Disallow: / # Do not enter website
or
User-Agent: googlebot # For this robot
Disallow: /something # do not enter
The apple-touch-icon-precomposed.png
is explained https://stackoverflow.com/a/12683605/722238
I believe the usage of missing.html
is used by some as a customized 404 page. It's possible that a robot may be configured to scrape this file, hence the requests for it.
You should add a robots.txt file if you want to control the resources a robot will scrape off your server. As said before, it's not mandatory for a robot to read this file.
If you wanted to add the other two files to remove the error messages you could, however, I don't believe it is necessary. There is nothing to say that joe_random won't make a request on your server for /somerandomfile.txt
in which case you will get another error message for another file that doesn't exist. You could then just redirect them to a customized 404 page.