linux network-programming ssh freebsd sshd

autossh tunnel hangs because of „Adress already in use” regardless of all timeouts

I use autossh to create a remote tunnel with the following command (IPs and Port Changed):

autossh -M 0 -o "ServerAliveInterval 5" -o "ServerAliveCountMax 3" -f -T -N -i /root/.ssh/id_rsa -R 1602:localhost:443 root@123.123.123.123

And the server has this config in sshd:

        GatewayPort yes
        ClientAliveInterval 10
        ClientAliveCountMax 6

This works most of the time like a charm. Also timeouts and disconnects get handled very well. But there is one exception: If there is only a very short interruption of the network connection – the client notice this and start a reconnect. But the server hasn’t noticed this yet and still uses this port 1602. I can then see in server log the message: sshd[431646]: error: bind [::]:1602: Address already in use.

But autossh does not hang up and try again, it keeps the not working tunnel open. A few seconds later, the server recognise the disconnect of the old tunnel and frees the port 1602.

Now I have a autossh/ssh tunnel – doing all the watchdog stuff (I can see in log this keep alive message all 5 seconds) and staying alive. The port on the server is now unused. And the tunnel is not working, because the port is not allocated at all now.

Autossh does not recover from this state without manual interaction. There are multiple ways to recover manually, but this is not the question.

My questions are:

Why autossh does not hang up and retry if the port is in use (would solve the issue) Or
How to force free the port and rebind to the new tunnel on reconnect? Or
How to detect tunnels without actual ports bound to it in order to kill them (for example each minute in a cronjob)

Im searching for a way to automatically recover from this state. And I wonder why this race condition is not mentioned in any place of the internet, even if it can be reproduced easily.

Solution

You need to add -o "ExitOnForwardFailure yes" as an autossh option.