Search code examples
pythontwisted

twisted - detection of lost connection takes more than 30 minutes


I've written a tcp client using python and twisted, it connects to a server and communicate in a simple string based protocol (Defined by the server manufacturer). The TCP/IP connection should persist, and reconnect in case of failure.

When some sort of network error occurs (I assume on the server side or on some node along the way), it takes a very long time for the client to realize that and initiate a new connection, much more than a few minutes.

Is there a way to speed that up? Some sort of built in TCP/IP keep alive functionality that can detect the disconnect sooner?

I can implement a keep alive mechanism myself, and look for timeouts, not sure that's the best practice in this case. What do you think? Also, when using reactor.connectTCP() and reactor.run() with a ClientFactory, what's the best way to force a re-connection?


Solution

  • Application level keep-alives for TCP-based protocols are a good idea. You should probably implement this. This gives you complete and precise control over the timeout semantics you want from your application.

    TCP itself has a keepalive mechanism. You can enable this with an ITCPTransport method call from your protocol. For example:

    class YourProtocol(Protocol):
        def connectionMade(self):
            self.transport.setTcpKeepAlive(True)
    

    The exact semantics of this keepalive are platform and configuration dependent. It's entirely possible this is already enabled and is what's detecting your connection lose. Thirty minutes is a pretty plausible amount of time for this mechanism to notice a lost connection.