Search code examples
elasticsearchnestkeep-alivetcp-keepalive

Why does NEST include TCP keep alive?


I noticed NEST has the ability to set TCP keep alive.

What problem is this trying to solve?

I thought http keep alive should be enough, which is implemented by default in NEST's internal connection pooling?

Can someone please shed light on the differences here, and what scenarios we should use.

Thanks


Solution

  • Not a Nest developer, but I do run a managed Elasticsearch service. While “keep alive” sounds valuable, it’s more valuable for an HTTP connection than a TCP connection.

    Per Wikipedia:

    Typically TCP Keepalives are sent every 45 or 60 seconds on an idle TCP connection, and the connection is dropped after 3 sequential ACKs are missed.

    That sounds pretty handy, but if you take some measurements, establishing TCP connections is likely to be sub-millisecond within the same datacenter.

    Given we’re looking at Elasticsearch activity, we’re now in the domain of HTTP. And an HTTP connection can have a fair bit more overhead. Especially in this day and age of secure connections, where the handshake to exchange TLS certificates can be several packets and take on the order of ~50ms.

    So while TCP keepalive can save you a millisecond per minute, HTTP keepalive can save you tens of milliseconds per request. For an app generating more than one HTTP request per minute, HTTP keep-alive can save a lot of time in the aggregate.

    TCP keepalive, on the other hand, will be negligible.

    See also: Relation between HTTP Keep Alive duration and TCP timeout duration

    So why does Nest include the option to turn on TCP keep-alive? Perhaps that’s a better question for a Nest developer. As a fellow engineer, sometimes it’s nice to include an option for all configurable possibilities, if only for the sake of completeness.