Which HTTP status code should I use for a health-check failure?

I'm implementing a /_status/ endpoint which does some sanity checks on data in our database.

For example, we are collecting measurements and the status should go "bad" if the latest measurement is over an hour old.

I would like to point Pingdom at this URL to leverage their alerting infrastructure and tell us when something's wrong.

On a "good" status I will serve an HTML page with an HTTP 200 OK status. But what would an appropriate HTTP status code be for "bad"? Or would it be more correct not to convey this information via status code, but via HTML content instead?

Thanks!

Solution

We just had a similar discussion in our group. We decided for our purposes that the HTTP response codes should be reporting on your server's success or failure to honor the request. For a GET, this would mean whether or not you can respond with the requested resource. In this case, the requested resource is a health report, so as long as you're returning that successfully, it should be a 200 response.

We're returning JSON for our health check, with a top-level "isHealthy" field set to true or false. Our load balancer and other monitors will parse the JSON and use this field to determine if the system is healthy or not.

If you don't want to parse JSON in your monitors, you could try putting a custom response header to indicate binary health of the system, e.g., System-Health: true or System-Health: false. You might have better luck getting monitors which can check that.

If you really want to use a response code, I would recommend an additional endpoint called something like "health" which returns a "204 No Content" when healthy, and a "404 Not Found" when not healthy. In this case, the resource defined by the URL is, symbolically, the health of your system, and so if it's healthy, you can return a successful response. If it's unhealthy, then it's health can't be found, hence the 404.