I'm using Apache 2.2.X and PHP 5.2.X (installed as Apache module) to build a new website and I would like to read your suggestions about how I'm trying to handle server errors.
I was thinking about using the same file of my homepage (/index.php) to show custom error messages.
This is my .htaccess setup:
ErrorDocument 400 /index.php?error=400
ErrorDocument 401 /index.php?error=401
ErrorDocument 403 /index.php?error=403
ErrorDocument 404 /index.php?error=404
ErrorDocument 500 /index.php?error=500
Now, in my index.php file I have some code that looks like this:
if (isset($_GET['error']))
DrawErrorPage($_GET['error']);
else
DrawHomepage();
Everything works like a charm.
Well, everything except one thing that I can't fix: if I force Apache to respond with a 500 status code (for example, inserting malformed code into my .htaccess), I'm not being redirected to "/index.php?error=500", but I get the default 500 error page instead. With any other status code (for example, 403 or 404) my configuration works absolutely perfectly.
But now I've a doubt and I'm starting to think that it would be better to use another page to handle errors (for example, "/error.php").
"DrawHomepage()" needs to set "robots" meta tag to "index, follow", while "DrawErrorPage()" needs to set it to "noindex, nofollow". Right? So... what would happen if a web crawler gets an error response visiting my homepage for the first time? What would happen if a web crawler gets 200 visiting my homepage for the first time, but a 500 visiting it a month later? And what would happen if I keep my "robots" meta tag to "index, follow" even if I'm showing errors?
Is there a workaround, a solution, for this problem? What would you do?
Many thanks!
Usually if there is a 500 status code then Apache has messed something up and it can't run your index.php file, resulting in another 500 status code. Apache continues this error loop for a few iterations before it finally says "no more loops" and sending its own error page.
The only really safe way to display a custom page for a 500 status code is to use plain text or use a basic .html or .shtml file that doesn't try to access other things on your server, so you don't keep triggering more 500 status codes in the page load.
Usually if a crawler encounters a 500, it will just ignore the page temporarily. A 500 code is recoverable, it doesn't necessarily mean there is no page there, just that the server is messed up at the moment. The bots are smart and can determine what error codes mean what, as long as the page is always sending the status code in the page header.
Remember, if you use a PHP file as your error document, you need to resend the HTTP status code using the header function inside PHP to ensure proper page detection, like so:
header("HTTP/1.1 404 Page Not Found");