Search code examples
timeoutkubernetesstartupprobe

Kubernetes livenessProbe shutdown during application startup


I am working with:

kubernetes 1.3.6

.. with this part in the deployment file of my application:

    livenessProbe:
      httpGet:
        path: /liveness
        port: 8082
      initialDelaySeconds: 120

.. so that when I describe the pod I got this

Liveness: http-get http://:8082/liveness delay=120s timeout=1s period=10s #success=1 #failure=3

My application often starts in 110-115 seconds, but sometimes it takes more (due to DB delays, external services retry, etc ..).

The problem I see is that when it takes more than 130/140 seconds (initialDelaySeconds + period), kubernetes forces the shutdown and the pod re-start from scratch. When you have a lot of replicas (50-60) it means that the full deployment sometimes takes 10-15 minutes more than the normal one. Obviously a solution is to increase the initialDelaySeconds, but then all the deployments will take a lot more time.

I had a look here and there's nothing that seems to solve this problem: http://kubernetes.io/docs/api-reference/v1/definitions/#_v1_probe

Ideally I would like to have something that works in the opposite way: not an "initialDelaySeconds", but a maximum amount of time to start the pod. If that time passes, kubernetes forces the pod shutdown and tries another time.


Solution

  • I finally ended up with a good solution, that at the moment works perfectly!

    I set:

    • readinessProbe.initialDelaySeconds: equals to the minimum startup time of the application
    • livenessProbe.initialDelaySeconds: equals to the maximum startup time of the application + couple of seconds

    So that kubernetes (after readinessProbe.initialDelaySeconds) starts to check readiness probe in order to add the pod to the balancing. Then (after livenessProbe.initialDelaySeconds) it starts to check also the liveness probe, in case the pod needs restarting.