Search code examples
httpkuberneteskeep-alivekubernetes-health-check

Fixing kubernetes service redeploy errors with keep-alive enabled


We have a kubernetes service running on three machines. Clients both inside and outside of our cluster talk to this service over http with the keep-alive option enabled. During a deploy of the service, the exiting pods have a readiness check that starts to fail when shutdown starts, and are removed from the service endpoints list appropriately, however they still receive traffic and some requests fail as the container will abruptly exit. We believe this is because of the keep-alive which allows the the client to re-use these connections that were established when the host was Ready. Is there a series of steps one should follow to make sure we don't run into these issues? We'd like to allow keep-alive connections if at all possible.


Solution

  • The issue happens if the proxying/load balancing happens in layer 4 instead of layer 7. For the internal services (Kubernetes service of type ClusterIP), since the Kube-proxy does the proxying using layer 4 proxying, the clients will keep the connection even after the pod isn't ready to serve anymore. Similarly, for the services of type LoadBalancer, if the backend type is set to TCP (which is by default with AWS ELB), the same issue happens. Please see this issue for more details.

    The solution to this problem as of now is:

    • If you are using a cloud LoadBalancer, go ahead and set the backend to HTTP. For example, You can add service.beta.kubernetes.io/aws-load-balancer-backend-protocol annotation to kubernetes service and set it to HTTP so that ELB uses HTTP proxying instead of TCP.
    • Use a layer 7 proxy/ingress controller within the cluster to route the traffic instead of sending it via kube-proxy