Search code examples
pythonsocketsselecttcpblocking

Python; Troubles controlling dead sockets through select


I have some code which will connect to a host and do nothing but listen for incoming data until either the client is shut down or the host send a close statement. For this my code works well.

However when the host dies without sending a close statement, my client keeps listening for incoming data forever as expected. To resolve this I made the socket timeout every foo seconds and start the process of checking if the connection is alive or not. From the Python socket howto I found this:

One very nasty problem with select: if somewhere in those input lists of sockets is one which has died a nasty death, the select will fail. You then need to loop through every single damn socket in all those lists and do a select([sock],[],[],0) until you find the bad one. That timeout of 0 means it won’t take long, but it’s ugly.

    # Example code written for this question.
    from select import select
    from socket include socket, AF_INET, SOCK_STREAM

    socket = socket(AF_INET, SOCK_STREAM)
    socket.connect(('localhost', 12345))
    socklist = [socket,]
    attempts = 0

    def check_socklist(socks):
        for sock in socklist:
            (r, w, e) = select([sock,], [], [], 0)          

            ...
            ...
            ...

    while True:

        (r, w, e) = select(socklist, [], [], 60)

        for sock in r:      
            if sock is socket:
                msg = sock.recv(4096)
                if not msg:
                    attempts +=1
                    if attempts >= 10:
                        check_socket(socklist)
                    break
                else:
                    attempts = 0
                    print msg

This text creates three questions.

  1. I was taught that to check if a connection is alive or not, one has to write to the socket and see if a response returns. If not, the connection has to be assumed it is dead. In the text it says that to check for bad connections, one single out each socket, pass it to select's first parameter and set the timeout to zero. How will this confirm that the socket is dead or not?
  2. Why not test if the socket is dead or alive by trying to write to the socket instead?
  3. What am I looking for when the connection is alive and when it is dead? Select will timeout at once, so having no data there will prove nothing.

I realize there are libraries like gevent, asyncore and twisted that can help me with this, but I have chosen to do this my self to get a better understanding of what is happening and to get more control over the source my self.


Solution

  • If a connected client crashes or exits, but its host OS and computer are still running, then its OS's TCP stack will send your server a FIN packet to let your computer's TCP stack know that the TCP connection has been closed. Your Python app will see this as select() indicating that the client's socket is ready-for-read, and then when you call recv() on the socket, recv() will return 0. When that happens, you should respond by closing the socket.

    If the connected client's computer never gets a chance to send a FIN packet, on the other hand (e.g. because somebody reached over and yanked its Ethernet cord or power cable out of the socket), then your server won't realize that the TCP connection is defunct for quite a while -- possibly forever. The easiest way to avoid having a "zombie socket" is simply to have your server send some dummy data on the socket every so often, e.g. once per minute or something. The client should know to discard the dummy data. The benefit of sending the dummy data is that your server's TCP stack will then notice that it's not getting any ACK packets back for the data packet(s) it sent, and will resend them; and after a few resends your server's TCP stack will give up and decide that the connection is dead, at which point you'll see the same behavior that I described in my first paragraph.