We have a bunch of pods that use RabbitMQ. If the pods are shut down by K8S with SIGTERM, we have found that our RMQ client (Python Pika) has no time to close the connection to RMQ Server causing it to think those clients are still alive until 2 heartbeats are missed.
Our investigation has turned up that on SIGTERM, K8S kills all in- and most importantly OUTbound TCP connections, among other things (removing endpoints, etc.) Tried to see if any connections were still possible during preStop hooks, but preStop seems very internally focused and no traffic got out.
Has anybody else experienced this issue and solved it? All we need to do is be able to get a message out the door before kubelet slams the door. Our pods are not K8S "Services" so some suggestions didn't help.
Steps to reproduce:
k delete pod
to start termination of Sender podWe tested this extensively and found that new EKS clusters, with Calico installed (see below) will experience this problem, unless Calico is upgraded. Networking will be immediately killed when a pod is sent SIGTERM instead of waiting for the grace period. If you're experiencing this problem and are using Calico, please check the version of Calico against this thread:
https://github.com/projectcalico/calico/issues/4518
If you're installing Calico using the AWS yaml found here: https://github.com/aws/amazon-vpc-cni-k8s/tree/master/config
Be advised that the fixes have NOT landed in any of the released versions, we had to install from master, like so:
kubectl apply \
-f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-operator.yaml \
-f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-crs.yaml
and we also upgraded the AWS CNI for good measure, although that wasn't explicitly required to solve our issue:
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.8.0/config/v1.8/aws-k8s-cni.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.9.1/config/v1.9/aws-k8s-cni.yaml
Here's a bunch of confusing documentation from AWS that makes it seem like you should switch to use new AWS "add-ons" to manage this stuff, but after an extensive discussion with support, was advised against