spring-boot kubernetes google-kubernetes-engine ingress-nginx

Pod looses network connection (connection reset errors) during preStop period

Environment:
I am running an Spring boot HTTP server and an HTTP client pod, where the client sends requests to the server using the myserver.svc.cluster.local address. Both server and client communicate over a keep-alive session. And the server is configured with a preStop hook set to 10 seconds during rolling updates

 A(pod)---
          | -- myserver(svc) <--[HTTP request with keep-alive]-- myclient(svc) --- C(pod)
 B(pod)---

Issue:
When performing a rolling update on the myserver, the client experiences a 2 times of a connection reset error. Each connection reset error occurs immediately as the each server pod enter the Terminating state. (beginning of preStop session)

Detailed condition:
Despite the preStop being set to 10 seconds, which should theoretically allow existing keep-alive sessions to continue without accepting new HTTP traffic, the server's IP is immediately removed from the endpoints upon the preStop hook initiation. As a result, keep-alive sessions, which should have been maintained with Pod A, are transferred to Pod B.

ex) For example, the client sends an HTTP GET request with a sequence number 61293, which was associated with a keep-alive session on Pod A. However, as soon as preStop on Pod A initiates, this packet is redirected to Pod B. Pod B, receiving a packet with an unexpected sequence number, sends an RST packet back to the client, resulting in a connection reset error.

What I Expected:
While I expect that new HTTP traffic will not reach the pod during the preStop period, I also expect that existing keep-alive sessions should be preserved. However, this is not happening. Is this a bug, or is it intended behavior?

Solution

TLDR; Kubernetes Services don't support connection draining.

As described here and here from the networking perspective pod termination flow looks like that:

Terminate pod API call is received
Pod is immediately removed from Endpoints but it still listed in EndpointSlices as:

"conditions": {
   "ready": false,
   "serving": true,
   "terminating": true
}

As you can see Pod is marked as still serving traffic but it doesn't accept new connections.

Kubernetes waits for Pod preStop hook completion and then sends SIGTERM signal to it. Pod gets removed from EndpointSlices.
Kubernetes waits until terminationGracePeriodSeconds period is over and then sends SIGKILL to Pod (if it doesn't finish up to this moment).

Unfortunately, EndpointSlices are not used by Kubernetes Services. EndpointSlices were created in order to allow external networking implementations to support advanced deployment techniques like truly zero-downtime updates.

You may use Load Balancer implementations that support this functionality such as cloud load balancers (AWS ALB, GKE LB, Azure AppGW). Nginx Ingress doesn't have support for this.

If you're running in GKE you can read more about it here: https://cloud.google.com/kubernetes-engine/docs/troubleshooting/load-balancing#500-series-errors