Search code examples
kubernetesmicroservicesreadinessprobe

What should a kubernetes readiness check check


I understand how to set up a readiness probe in kubernetes, but are there any best practices about what the microservice should actually check when the readiness probe is called? Two specific examples:

  1. A microservice that fronts a db where without a functioning db connection, practically all functionality will not work. Here I think pinging the db would be reasonable, failing the readiness check if ping fails. Is this recommended?
  2. A microservice that uses N other microservices, but where failure to connect to any one would still allow for a majority of the functionality to work. Here I think checking for connectivity to the backing services is ill advised. In this case, assuming there is no extensive "boot" or "warm up" processing, liveness and readiness are equivalent. Correct?

Thank you


Solution

  • No, I don't think there are best practise for readiness probes.

    It all depends on applciation and what you exepct to happen.

    Here I think pinging the db would be reasonable, failing the readiness check if ping fails

    I will try to comment on this. Let's imagine you have some backend microservice (deployment with serveral replicas) and it's communicating with a db. When db fails (assuming no replication or some serious db downtime), your pod replicas' readiness probes start to fail and the pods' endpoints are being deleted from the Service. Now when client tries to access the service, it will result in connection timeout because no service is there to handle the request.

    You have to ask yourself if this is the behaviuor you want/expect or maybe it would be much more convinient for the readiness probe not to fail when db fails, microservice would still handle traffic in this case, and would be able to return an error message to the client informing him about the problem.

    Even simple 503 would be much better in this case. Getting an actual error message tells me much more about the actual issue than getting connection timeout.


    [...] but where failure to connect to any one would still allow for a majority of the functionality to work. Here I think checking for connectivity to the backing services is ill advised. In this case, assuming there is no extensive "boot" or "warm up" processing, liveness and readiness are equivalent.

    It depends on the usecase. In application code you can react much quicker to problems that happen to backing services and I would use this approach whenever I can, and only use readiness for checking backing services whenever it can't be handled differently.


    So for me liveness probe answers the question: "Is this application still running?" And readines proble answers the question: "Is this application ready to handle/capable of handling the traffic?"

    And it's up to you to define what does it mean to "still run" and "be able to handle traffic".

    But usually if appliation is running, it is also able to handle the traffic so in this case liveness and readiness are indeded equal.