Search code examples
linuxsocketstcp

CLOSE_WAIT TCP states despite closed file descriptors


My Linux server application listens to port 8000 and closes all its file descriptors (FDs) correctly using close().

Nevertheless, I sometimes observe up to 3000 CLOSE_WAIT TCP connections:

# netstat -antp | grep CLOSE_WAIT
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp      149      0 127.0.0.1:8000          127.0.0.1:49630         CLOSE_WAIT  -                   
tcp      236      0 127.0.0.1:8000          127.0.0.1:48440         CLOSE_WAIT  -                   
tcp      251      0 127.0.0.1:8000          127.0.0.1:41748         CLOSE_WAIT  -                   
tcp      149      0 127.0.0.1:8000          127.0.0.1:46064         CLOSE_WAIT  -                   
tcp      251      0 127.0.0.1:8000          127.0.0.1:56654         CLOSE_WAIT  -                   
tcp      251      0 127.0.0.1:8000          127.0.0.1:37502         CLOSE_WAIT  -                   
tcp      251      0 127.0.0.1:8000          127.0.0.1:56976         CLOSE_WAIT  -                   
tcp      251      0 127.0.0.1:8000          127.0.0.1:36416         CLOSE_WAIT  -                   
... ~3000 more of these ...

(netstat is running as root so there's no missing data.)

I know CLOSE_WAIT would occur when the server application does not close() an FD connected to a socket. This is explained in the TCP state diagram of RFC793 (nicer rendering e.g. here) and also in e.g. https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/

But I know that my server does close() correctly because ls -1 "/proc/$(pidof myserver)/fd" | wc -l on the server process shows only 90 open FDs, not 3000.

Further evidence for correct closing is that netstat -p as shown above list no program associated with the port (see CLOSE_WAIT -).

Some collection of other unsolved cases where CLOSE_WAIT - is shown without associated process:

So the question:

How can this many more CLOSE_WAIT states exist than open socket FDs?

Why is Linux contradicting itself regarding the output of /proc/$PID/fd and netstat, how can it be that CLOSE_WAIT - can occur at all given that CLOSE_WAIT must have an unclosed socket (FD) associated with it?


Solution

  • I figured it out:

    A CLOSE_WAIT state without associated process occurs when a client waiting in the Linux kernel's listen() backlog queue disconnects before the user-space application accept()s it.

    This is easily reproducible with netcat, see below.

    A summary of existing commentary on this question:

    • My intuition in the above comments was correct that if no associated process is shown, it must be involving only the kernel, and not my program's file descriptors.
    • Commenters' suggestions are incorrect that these process-less CLOSE_WAITs are created by the server forgetting to call close().
    • Commenters' suggestions are incorrect that the kernel's display of /proc/<pid>/fd is somehow bugged.

    Repro with nc

    Short repro summary (read below for explanations):

    nc -l 1234
    nc localhost 1234
    nc localhost 1234         # press Ctrl+C here
    ss -tapn 'sport = :1234'  # shows process-less `CLOSE-WAIT`
    
    Terminal 1 ("server"):
    nc -v -l 127.0.0.1 1234
    

    This creates a socket (and calls listen(..., 1) on it with backlog queue length 1, and calls accept() to wait for a connection.

    (Can be verified with strace.)

    Aside:

    • This queue is called the "accept queue" (great in-depth article about it), containing connections to be accept()ed, but I'll refer to it as the listen() queue here because its size is determined by listen() and its life time is determined by the socket returned by listen().
    • Linux (6.1.51 in my case) actually sets the real queue size to backlog + 1, so the queue really has 2 slots. I haven't researched why this is, but it is mentioned e.g. here and I've experimentally verified it: For a listen(, ...), 3 clients can connect to the above netcat server (1 accept()ed, 2 in the queue), and only the 4th one will hang without a connection.
    • The passed backlog size can be observed as the Send-Q field in the ss -tlpn 'sport = :1234' output below.

    Terminal 2 ("client A"):
    nc -v -4 127.0.0.1 1234
    

    This connection makes accept() return on the server.


    Terminal 3 ("client B"):
    nc -v -4 127.0.0.1 1234
    

    This connection fills an open slot in the server's listen() queue. It is not accept()ed.

    Now, press Ctrl+C to cancel this nc. This creates the CLOSE_WAIT state from the question.

    Observing ss output

    If we run the above repro while watching sudo watch -n1 --exec ss -tapn 'sport = :1234' in another terminal, we can observe the state after each above step:

    # After the server is started, we see the listening socket with a `Recv-Q` of `0`:
    
    State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
    LISTEN 0      1          127.0.0.1:1234      0.0.0.0:*    users:(("nc",pid=3613079,fd=3))
    
    # After client A is started, we see the listening socket with a `Recv-Q` of `0`
    # because client A was `accept()`ed:
    
    State  Recv-Q Send-Q Local Address:Port Peer Address:Port  Process
    LISTEN 0      1          127.0.0.1:1234      0.0.0.0:*     users:(("nc",pid=3613079,fd=3))
    ESTAB  0      0          127.0.0.1:1234    127.0.0.1:52190 users:(("nc",pid=3613079,fd=4))
    
    # After client B is started, we see the listening socket with a `Recv-Q` of `1`
    # because client B has not yet been `accept()`ed and is in the queue:
    
    State  Recv-Q Send-Q Local Address:Port Peer Address:Port  Process
    LISTEN 1      1          127.0.0.1:1234      0.0.0.0:*     users:(("nc",pid=3613079,fd=3))
    ESTAB  0      0          127.0.0.1:1234    127.0.0.1:42420
    ESTAB  0      0          127.0.0.1:1234    127.0.0.1:52190 users:(("nc",pid=3613079,fd=4))
    

    In the last step, we can already observe an ESTAB connection with Process being empty. This is because indeed the connection is esablished -- but only with the server's kernel, not the server process, since the process has not yet accept()ed the connection.

    The kernel does the TCP SYN-ACK-ACK handshake for us, thus the TCP connection is established in the kernel, before accept() happens.

    Now after client B disconnects, we see the CLOSE-WAIT without process:

    State      Recv-Q Send-Q Local Address:Port Peer Address:Port  Process
    LISTEN     1      1          127.0.0.1:1234      0.0.0.0:*     users:(("nc",pid=3621113,fd=3))
    ESTAB      0      0          127.0.0.1:1234    127.0.0.1:52628 users:(("nc",pid=3621113,fd=4))
    CLOSE-WAIT 1      0          127.0.0.1:1234    127.0.0.1:45096
    

    And Recv-Q is still 1, so the disconnected connection is still in the kernel queue!

    Observing netstat output

    Same with netstat: sudo watch -n1 'netstat -antpe | grep 1234'

    We see some more lines because netstat cannot do the convenient 'sport = :1234' filtering that ss provides, so we see the client-side sockets too:

    # After the server is started:
    
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
    tcp        0      0 127.0.0.1:1234          0.0.0.0:*               LISTEN      1000       62603813   3621113/nc
    
    # After client A is started:
    
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
    tcp        0      0 127.0.0.1:1234          0.0.0.0:*               LISTEN      1000       62603813   3621113/nc
    tcp        0      0 127.0.0.1:52628         127.0.0.1:1234          ESTABLISHED 1000       62608860   3621978/nc
    tcp        0      0 127.0.0.1:1234          127.0.0.1:52628         ESTABLISHED 1000       62603814   3621113/nc
    
    # After client B is started:
    
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
    tcp        1      0 127.0.0.1:1234          0.0.0.0:*               LISTEN      1000       62603813   3621113/nc
    tcp        0      0 127.0.0.1:52628         127.0.0.1:1234          ESTABLISHED 1000       62608860   3621978/nc
    tcp        0      0 127.0.0.1:1234          127.0.0.1:52628         ESTABLISHED 1000       62603814   3621113/nc
    tcp        0      0 127.0.0.1:45096         127.0.0.1:1234          ESTABLISHED 1000       62609455   3622106/nc
    tcp        0      0 127.0.0.1:1234          127.0.0.1:45096         ESTABLISHED 0          0          -
    

    Again, here we have the first sought - as PID/Program name for the ESTABLISHED connection.

    This also answers this question:

    And after the disconnect of client B:

    Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
    tcp        1      0 127.0.0.1:1234          0.0.0.0:*               LISTEN      1000       62603813   3621113/nc
    tcp        0      0 127.0.0.1:52628         127.0.0.1:1234          ESTABLISHED 1000       62608860   3621978/nc
    tcp        0      0 127.0.0.1:1234          127.0.0.1:52628         ESTABLISHED 1000       62603814   3621113/nc
    tcp        0      0 127.0.0.1:45096         127.0.0.1:1234          FIN_WAIT2   0          0          -
    tcp        1      0 127.0.0.1:1234          127.0.0.1:45096         CLOSE_WAIT  0          0          -
    

    That's our CLOSE_WAIT with - process.

    More minimal repro with Python

    Since nc may change what exactly it does over time (e.g. the system calls it makes), here's a similar TCP server in Python to make it extra clear what's happening on the server:

    #!/usr/bin/env python3
    
    import socket
    
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:  # TCP
      s.bind(('127.0.0.1', 1234))
      s.listen(1)
      while True:
        conn, addr = s.accept()
    
        print(f"Got client {addr} as FD {conn.fileno()}")
        action = input("Press enter to close the current connection and call accept() again, or enter 'close-socket' to close the entire socket... ").strip()
    
        if action == "close-socket":
          s.close()
          input("Socket closed, press enter to terminate... ")
          exit()
        else:
          conn.close()
    

    Why the CLOSE_WAIT without process exists and how to get rid of it

    The kernel keeps the disconnected CLOSE_WAIT connection in the queue defined by listen() without an associated process until either:

    • the server close()s the socket returned by listen(), or
    • the server accept()s the already-disconnected connection.

    Accepting the already-disconnected connection will succeed: accept() will hand the server an FD. This converts the CLOSE_WAIT without process into a CLOSE_WAIT with process. The server may now call close() on the FD to close the connection and resolve the CLOSE_WAIT state.

    Calling close() on the socket returned by listen() tears down the entire kernel queue, thus CLOSE_WAIT disappears immediately.

    Summary

    • Process-less CLOSE_WAITs are connections that entered the listen() queue and were terminated by the other side sending TCP FIN before being taken out of the listen() queue before our process accept()s them.
    • They live in the kernel.
    • They have no file descriptor (FD) associated with them, because the accept() that would assign them an FD has not happened yet.
    • This explains why there can be more CLOSE_WAITs than file descriptors.

    The question's problem with 3000 such process-less CLOSE_WAITs suggests that the server is not accept()ing connections. This might be due to a bug, or because the server process is busy doing something else (e.g. garbage collection or running some other functions). Thus the queue fills up. The server must have called listen(, 3000) or higher. Indeed, I can see Send-Q is 4096 in ss -tlpn 'sport = :8000'. When the queued clients eventually give up due to some timeout, the queued ESTABLISHED connections become queued CLOSE_WAIT connections.

    Thus, the next step for solving the issue with up to 3000 CLOSE_WAIT connections should be figuring out why the server stops to call accept().