Search code examples
kubernetesmulti-tenantreadinessprobe

Kubernetes Readiness Probe in multi-tenant system


I am designing a system with a single kubernetes deployment for multiple tenants, but there are multiple databases, queues etc. per customer. Anything stateless is shared, and anything with state is separate for each tenant. Based on the request host (tenant1.company.com or tenant2.company.com), the code will connect to the corresponding databases and queues.

How should readiness probes be designed in this case, where my pod is designed to work for multiple tenants?

I can think of the following options, none of which seem correct:

  1. Connect to all databases and queues and see if they are ready: Disadvantage: this will cause pod to be not ready even if one resource is down.
  2. Connect to any one database and queue: Disadvantage: doesn't really check the readiness for all probes.
  3. Do not have any readiness probe at all.

It feels like if I have separation at resources level to support multiple tenants (this is a B2B multi-tenant, takes time and effort to onboard new tenant), I also need to have separation at Kubernetes deployment level.

Is this the standard approach - either have complete separation at all levels, or have one single unified system with same shared resources? If not, how do I design the readiness probes?


Solution

  • As per my understanding, you are trying to extend Kubernetes Pod's readiness probe to reflect Application health for specific tenant. Unfortunately, Readiness probe is not designed for that.

    The only purpose of Kubernetes readiness probe (even the new feature Pod Ready++) is to reflect the certain Pod ability to serve the traffic. Deployment and StatefulSet controllers take into account Pod readiness state during the rolling-update process.

    You can brake the whole update mechanics if you set the readiness probe to depend on external to the Pod components or a network endpoints connectivity. The correct way of using Readiness probe is to check only the Pod's internal components' conditions.

    Kubernetes documentation pages:

    For certain simple application or microservices that only contains one Pod, it may reflect also the state of the application. But usually, application architecture is much more complex and contains many parts, each of which may have dependencies.

    Sometimes it's cheaper and simpler to create your own health check loop in the frontend app (www.example.com/healthz) that reflects the whole App health, taking to account all its components statuses and their dependencies, or collect and aggregate JSON statuses from other components.

    In Kubernetes world, components/apps are usually Services that balance the traffic to one or more Pods. Thus, a component is healthy if at least one Pod behind the corresponding Service is in Ready state. The number of Ready Pods behind the Service tells more about App Performance, than about the App Health.

    As per my ability to imagine your App design:

    • I would use several Ingress objects that make the traffic to be forwarded to the dedicated for each tenant frontend, in the tenant's Namespace. All other tenant's resources are also deployed there.
    • All shared components I would put in the additional Namespace, let's say "shared/static/commmon/stateless" and create ExternalName service in each tenant's Namespace to access them (or Ingress if I would serve this content on a specific URL path).
    • I would also deploy some solution for Application+cluster monitoring.

    This way you can easily scale application parts if some tenant needs more resources.
    To manage deployments I would use Helm charts. This way I can easily deploy one more tenant or remove/update the existing one.

    There are many different solutions to monitor an Application Health, Performance, collect metrics and logs and take actions if certain conditions are met. It's just a brief list of most popular solutions:

    PS: In case you want to implement a circuit breaker for the tenants, Istio has built-in functionality.