Search code examples
socketserlanggen-tcp

CLOSED error when establishing lots of connections with gen_tcp in parallel (Bug?)


When trying to establish a largeish number of TCP connections in parallel I observe some weird behavior I consider a potential bug in gen_tcp.

The scenario is a server listening on a port with multiple concurrent acceptors. From a client I establish a connection by calling gen_tcp:connect/3, afterwards I send a "Ping" message to the server and wait in passive mode for a "Pong" response. When performing the 'get_tcp:connect/3' calls sequentially all works fine, including for large number of connections (I tested up to ~ 28000).

The problem occurs when trying to establish a lot of connections in parallel (depending on the machine between ~75 and several hundred). While most of the connections still get established, some connections fail with a closed error in gen_tcp:recv/3. The weird thing is, that these connections did not fail before, the calls to gen_tcp:connect/3 and gen_tcp:send/2 were both successful (i.e. returned ok). On the server side I don't see a matching connection for these "weird" connections, i.e. no returning gen_tcp:accept/1. It is my understanding, that a successful 'get_tcp:connect/3' should result in a matching accepted connection at the server side.

I already filed a bug report, there you can find a more detailed description and a minimal code example to demonstrate the problem. I was able to reproduce the problem on Linux and Mac OS X and with different Erlang versions.

My questions here are:

  1. Is anyone able to reproduce the problem and can confirm, that this is erroneous behavior?
  2. Any ideas for a workaround? How to deal with this problem, other starting all the connections sequentially (which takes forever)?

Solution

  • TCP 3-way handshake Client Server

      connect()│──┐          │listen()
               │  └──┐       │
               │      SYN    │
               │        └──┐ │
               │           └▶│   STATE
               │          ┌──│SYN-RECEIVED
               │       ┌──┘  │
               │   SYN-ACK   │
               │ ┌──┘        │
       STATE   │◀┘           │
    ESTABLISHED│──┐          │
               │  └──┐       │
               │     └ACK    │
               │        └──┐ │   STATE
               │           └▶│ESTABLISHED
               ▽             ▽
    

    The problem lies with the finer details of the 3-way handshake for establishing a TCP connection and the queue for incoming connections at the listen socket. See this excellent article for details, much of the following explanation was informed by this article.

    In Linux there are actually two queues for incoming connections. When the server receives a connection request (SYN packet) and transitions to the state SYN-RECEIVED, this connection is placed in the SYN queue. If a corresponding ACK is received, the connections is placed in the accept queue for the application to consume. The {backlog, N} (default: 5) option to gen_tcp:listen/2 determines the length of the access queue.

    When the server receives an ACK while the accept queue is full the ACK is basically ignored and no RST is sent to the client. There is a timeout associated with the SYN-RECEIVED state: if no ACK is received (or ignored, as is the case here), the server will resend the SYN-ACK. The client then resends the ACK. If the application consumes an entry from accept queue before the maximum number of SYN-ACK retries has been reached, the server will eventually process one of the duplicate ACKs and transition to state ESTABLISHED. If the maximum number of retries has been reached the server will send a RST to the the client to reset the connection.

    Coming back to the behavior observed when starting lots of connections in parallel. The explanation is, that the accept queue at the server fills up faster than our application consumes the accepted connections. The gen_tcp:connect/3 calls on the client side return successfully as soon as the receive the first SYN-ACK. The connections do not get reset immediately because the server retries the SYN-ACK. The server does not report these connections as successful, because they are still in state SYN-RECEIVED.

    On BSD derived system (including Mac OS X) the queue for incoming connections works a bit different, see the above mentioned article.