Search code examples
c++linuxsocketstcpnetstat

accept() block indefinitely on repeated connection attempts


I'm programming a TCP server, one that I desire to accept a single connection at a time, and by reusing the address and port it uses for listening. The first connection to a started instance of the server (e.g. via netcat) always succeeds, but subsequent connection attempts halt at accept() not returning a socket descriptor. I've experimented with different queue lengths, and with connecting while the previous connection is in TIME_WAIT state, and also after it's been cleared, but the result is the same. Both netcat and netstat report that the new connection attempt is successful, and report that the connection is established (regardless whether the previous connection is in TIME_WAIT or expired), but my server is stuck at the accept() call, thus it doesn't register the new connection. This behaviour doesn't always happen immediately at the first subsequent connection attempt, but pretty much always during the first three attempts.

The code:


main() {
    Socket socket(10669);
    
    while (true) {
        socket.establish_connection();
        
        socket.receive(callback);
        socket.close_connection();
    }
}



void Socket::establish_connection() {
    // Creating socket file descriptor
    int server_fd = 0;
    if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {
        throw ...;
    }

    // Setting socket options
    int socket_options = 1;
    if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEPORT, &socket_options, sizeof(socket_options))) {
        throw ...;
    }

    struct sockaddr_in address;
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(port);

    if (bind(server_fd, (sockaddr *) &address, sizeof(address)) < 0) {
        throw ...;
    }

    if (listen(server_fd, 1) < 0) {
        throw ...;
    }

    spdlog::info("Listening for clients on port {}", port);

    // this is where it blocks at repeated connection attempts
    struct sockaddr_in client_address;
    int addrlen = sizeof(client_address);
    if ((socket = accept(server_fd, (sockaddr *) &client_address,  (socklen_t*) &addrlen)) < 0) {
        throw ...;
    }

    spdlog::info("Client connected\n");
}


void Socket::receive(SocketCallback callback) {
    while (true) {
        fd_set read_socket_fd;
        FD_ZERO(&read_socket_fd);
        FD_SET(socket, &read_socket_fd);

        int sel = select(socket+1, &read_socket_fd, NULL, NULL, NULL);

        if (sel > 0) {
            // receiving data, no problems here
        }
    }
}


void Socket::close_connection() {
    close(socket);
}

Some printouts from the server, and netstat:

On startup (server):

[2020-07-07 13:33:53.387] [info] Socket initialised to use port 10669
[2020-07-07 13:33:53.387] [info] Listening for clients on port 10669

On startup (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN

On first connection (server):

[2020-07-07 13:34:35.481] [info] Client connected

On first connection (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54860         localhost:10669         ESTABLISHED
tcp        0      0 localhost:10669         localhost:54860         ESTABLISHED

On first disconnect from the client (server):

[2020-07-07 13:35:47.903] [warning] Client disconnected
[2020-07-07 13:35:47.903] [info] Listening for clients on port 10669

On first disconnect from the client (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54860         localhost:10669         TIME_WAIT

On second connection attempt the server reports nothing, as it is stuck on the "listening for clients..." line, indicating being blocked at accept(). This is what netstat reports (this is when I connected immediately after the first disconnect, so while the previous connection was in TIME_WAIT state):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        1      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54968         localhost:10669         TIME_WAIT
tcp        0      0 localhost:54970         localhost:10669         ESTABLISHED
tcp        0      0 localhost:10669         localhost:54970         ESTABLISHED

The same happens when I finish waiting for TIME_WAIT to expire and only then try to connect:

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        1      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:10669         localhost:55134         ESTABLISHED
tcp        0      0 localhost:55134         localhost:10669         ESTABLISHED

In both cases the connection is active in netcat, I can freely type, but of course nothing is being received; there are no other processes that could intercept the connection.

I know that I might try the non-blocking accept(), but this blocking behaviour of accept() fits my usage perfectly, when it behaves as intended, so the question is - why would it block on reconnects, what am I missing here?


Solution

  • You are supposed to create one server socket and then call accept repeatedly on the same socket. You seem to be creating a new server socket every time you call accept, and leaving the old ones open.

    Normally, this is invalid, but you used SO_REUSEPORT to tell the operating system that you really want it. With SO_REUSEPORT, incoming connections are balanced across all the server sockets on the same port. Apparently, the operating system chose to send your new connection to the first socket, and then you tried to accept it from the second one, where there wasn't a new connection waiting.

    To fix it, create a server socket once and then always accept from that same socket.