Search code examples
google-kubernetes-engineload-balancingkubernetes-health-check

GKE ignores readiness probe from pod during high load


I have an app running in kubernetes, on a couple of pods. I'm trying to improve our deployment experience (we're using rolling deployment), which is currently causing pains.

What I want to achieve:

  • each pod first goes not ready, so it gets no more traffic
  • then it will finish the requests it's processing currently
  • then it can be removed

This should all be possible and just work - you create a deployment that contains readiness and liveness probes. The load balancer will pick these up and route traffic accordingly. However, when I test my deployment, I see pods getting requests even when switching to not ready. Specifically, it looks like the load balancer won't update when a lot of traffic comes in. I can see pods going "not ready" when I signal them - and if they don't get traffic when they switch state, they will not receive traffic afterwards. But if they're getting traffic while switching, the load balancer just ignores the state change.

I'm starting to wonder how to handle this, because I can't see what I'm missing - it must be possible to host a high traffic app on kubernetes with pods going "not ready" without losing tons of requests.

My configurations

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-service
  name: my-service
  namespace: mine
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-service
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
    app: my-service
    env: production
    spec:
      containers:
      - name: my-service
    image: IMAGE ID
    imagePullPolicy: Always
    volumeMounts:
    - name: credentials
      mountPath: "/run/credentials"
      readOnly: true
    securityContext:
      privileged: true
    ports:
    - containerPort: 8080
      protocol: TCP
    lifecycle:
      preStop:
        exec:
          command: ["/opt/app/bin/graceful-shutdown.sh"]
    readinessProbe:
      httpGet:
         path: /ready
         port: 8080
      periodSeconds: 1
      initialDelaySeconds: 5
      failureThreshold: 1
    livenessProbe:
      httpGet:
         path: /alive
         port: 8080
      periodSeconds: 1
      failureThreshold: 2
      initialDelaySeconds: 60
    resources:
      requests:
        memory: "500M"
        cpu: "250m"
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 60
      nodeSelector:
    cloud.google.com/gke-nodepool: stateful

Service/loadbalancer

apiVersion: v1
kind: Service
metadata:
  name: stock-service-loadbalancer
  namespace: stock
spec:
  selector:
    app: stock-service
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

Solution

  • This turned out to be an effect or long-lasting connections, not of traffic amount. The cause seems to be that the load balancer won't close open connections - and for our service we were using a testing setup that was using a pool of long-running connections. So, the load balancer was updating it's routes, but the existing connections kept sending data to the terminating pod.

    The upshot is that this strategy for zero downtime does work:

    • use a preStop hook to make your pod fail the readiness probe
    • make sure to wait a couple of seconds
    • then let your pod terminate gracefully through the SIGTERM
    • make sure your terminationGracePeriodSeconds is large enough to encompass both preStop and actual termination period