Search code examples
httphttp-status-codeshttp-status-code-503health-monitoringkubernetes-health-check

What should the HTTP Status Code of a Degraded Health Check Be?


I have a health check endpoint at /status that returns the following status codes and response bodies:

  • Healthy - 200 OK
  • Degraded - ?
  • Unhealthy - 503 Service Unnavailable

What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?


Solution

  • The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.

    I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.

    You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...

    Other comments on your design

    The most natural way to interpret a 5xx response to any request is that the operation in question failed.

    So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:

    A health endpoint is only meaningful in the context of the component it indicates the health of. It has no other meaning or purpose. As such, its health is a conduit to the health of the component. Clients SHOULD assume that the HTTP response code returned by the health endpoint is applicable to the entire component (e.g. a larger API or a microservice).

    But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.

    Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.

    If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.

    So be careful with how you name things and what status codes you use!