Python: TCP broken route is painfully slow to detect

I have troubles with my server application written in Python3/asyncio(Protocol), but I am pretty sure it's not much python or asyncio related, because I've tried different version also some 5liner just with the socket interface. It's about concurrent communication with many clients hardware TCP/IP<->RS232 converters. That's the reason asyncio is used, instead of threads with blocking write.

There is some periodic short data sending. The problem occurs when I physically cut the connection and wait for the exception to occur:

asyncio - Fatal read error on socket transport protocol
<_SelectorSocketTransport fd=11 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
File "/usr/lib/python3.5/asyncio/selector_events.py", line 663, in
_read_ready
data = self._sock.recv(self.max_size)
OSError: [Errno 113] No route to host

It happens, but after 15 minutes, which means I am signaling for 15 minutes everything is alright, but it isn't, which is unbearably long and function breaking. Behavior checked in Ubuntu 16.04, Ubuntu 14.04 and Debian Jessie, all at different HW.

I found that (probably) kernel is buffering data, because if I reconnect device after ten minutes, all the data is flushed at once. I understand this is good for short disconnection, I would have no problem with 10s, 15s or even a minute, but 15 minutes is too much.

Similar question was answered by implementing application protocol, which is not possible in my case. I just want to be sure the other side gets the packet (TCP ack) in some reasonable time. I carefully read docs about socket.setsockopt but didn't find anything useful. Also didn't find method how to check if send buffer was flushed to do some workarounds-manual detection of broken route.

TCP keep-alive is not helping either, because it's based on inactivity time and sending data is activity.

Solution

You are seeing TCP's retransmit timeout (RTO) behavior.

Your TCP never receives any feedback¹ so it tries really hard to get the segments across. On Linux this behavior is governed by net.ipv4.tcp_retries2 = 15:

This value influences the timeout of an alive TCP connection, when RTO retransmissions remain unacknowledged. Given a value of N, a hypothetical TCP connection following exponential backoff with an initial RTO of TCP_RTO_MIN would retransmit N times before killing the connection at the (N+1)th RTO.

The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout.

What this means is that your send apparently works (i.e. TCP has agreed to send your data eventually) and for ~900 seconds you wait for TCP to keep retrying.

Changing the application protocol is a robust way to fix this but since you mention it doesn't work for you, your options revolve around asking TCP.

TCP_USER_TIMEOUT seems to do exactly what you want:

When the value is greater than 0, it specifies the maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application.

Further details on Application Control of TCP retransmission.

Also didn't find method how to check if send buffer was flushed to do some workarounds-manual detection of broken route.

The question linked above has SIOCOUTQ - checking the amount of data in the output queue - as the workaround you describe.

¹For example it could receive a TCP RST or an ICMP unreachable.