Search code examples
tcpgrpchttp2

Difference between GRPC KeepAlive and TCP Keepalive


HTTP2 and GRPC support a keepalive feature (not to be confused with http 1.1 keepalive header).

https://github.com/grpc/grpc/blob/master/doc/keepalive.md

This feature periodically send Ping request and receive ping response to check if the connection is still open and not stuck in "half open".

But the TCP protocol also have a Keepalive feature, which periodically send ping packet on the connection to verify its still open and avoid load balancer from dropping the connection because its idle.

Since those 2 Keepalive feature are doing exactly the same thing, when should you use HTTP2 keepalive instead of TCP Keepalive. And why both exist ?


Solution

  • gRFC A8 discusses gRPC Keepalive feature. The Background section discusses briefly how TCP keepalive works and the Rationale discusses reasons to use HTTP/2 PING.

    I will mention that gRPC implementations often have TCP keepalive enabled, but just use the default OS configuration.

    Some quotes from the gRFC:

    TCP keepalive is hard to configure in Java and Go.

    Enabling keepalive is easy, but configuring it is troublesome. Java often can't configure it at all. Go allows you to configure it, but the API sets both the time and interval to the same value, such that setting a period of 5 minutes takes an hour (5 minutes multiplied by 10 or 11 retries) for the connection to be determined as dead. So the Go API requires very aggressive keepalive settings before becoming useful.

    TCP keepalive is active even if there are no open streams.

    The gRFC mentions mobile, but datacenter traffic can be impacted from this as well because of the large number of connections. If the connection is inactive, it is often better to release the resource instead of spending more resources to keep it alive. That is possible with IDLE_TIMEOUT (available in Java and C, IIRC).

    There are no known generally available methods usable to gRPC for detecting abusive usage of TCP keepalive.

    TCP keepalives are very hard to detect for network monitoring and abuse control. A client could set a very aggressive value and the server has to unknowingly pay part of the cost. At large scale, overly aggressive keepalives could be a material cost (10+% of network traffic), and it would be very difficult to notice, fix, and prevent reoccurence. The HTTP/2 PING based design allows aggressive keepalive, but only if the server accepts the associated costs.