Search code examples
visual-studiogdbvisual-studio-2019remote-debugging

Intermittent connection failure to remote machine when remote (gdb) debugging with VS2019


I'm compiling and debugging native C++ code on a Linux VM hosted (with Hyper-V) on the same machine on which I'm running Visual Studio 2019 (Enterprise Version 16.11.1). The remote connection only works most of the time. About 15%-20% of the time when I attempt to start the build or a debug session, it fails with:

"Could not connect to the remote system. Please verify your connection settings, and that your machine is on the network and reachable."

There is no reliable time period after which this happens (or doesn't). I can successfully remote-compile, then try to start debugging two (2) seconds later, and have it fail. Once it fails, I go to Tools > Options > Cross Platform > Connection Manager > [highlight already-selected connection] > Verify and I get a dialog box indicating "Connection verified.", indicating that it is in fact able to connect.

It can work fine many times in a row, and then suddenly fail. Once it fails, I have to close Visual Studio and re-open it to make it start working properly again. Through experimentation, I've found that I can also change the connection to a different remote host, and then back to the original again, to make it start working again, but that takes longer than just bouncing VS2019. It is becoming a real PITA to have to restart VS2019 every few minutes while developing.

Are others experiencing this intermittent failure, and/or have any ideas regarding what causes it or how it can be resolved (or even worked around faster than my current method)?

The tail of the remote connections log after a failed attempt to start a debugging session is:

07:13:06.4516823 [Info, Thread 82] liblinux.RemoteSystemBase: Connecting over SSH to 10.10.10.10:22
07:13:06.6101023 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "g++ -v" finished with exit code 0 after 46.0657ms
07:13:06.6127453 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "clang++ -v" finished with exit code 127 after 2.1315ms
07:13:06.6151614 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gdbserver --version" finished with exit code 0 after 2.3438ms
07:13:06.6181017 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gcc -v" finished with exit code 0 after 2.8722ms
07:13:06.6634892 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gdb -v" finished with exit code 0 after 45.5621ms
07:13:06.7094904 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "/usr/bin/gdb -v" finished with exit code 0 after 45.7139ms
07:13:06.7114880 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "/usr/local/bin/gdb -v" finished with exit code 127 after 2.5014ms
07:13:06.7184905 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "rsync -v" finished with exit code 1 after 6.6996ms
07:13:06.7209159 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "lldb -v" finished with exit code 127 after 2.1041ms
07:13:06.7235831 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "ninja --version" finished with exit code 0 after 2.6598ms
07:13:06.7265292 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "cmake --version" finished with exit code 0 after 2.6648ms
07:13:06.7284878 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "make -v" finished with exit code 0 after 2.3541ms
07:13:06.7324881 [Info, Thread 82] liblinux.IO.RemoteFileSystemImpl: Connecting over SFTP to 10.10.10.10:22
07:13:06.8813322 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "cat /etc/os-release" finished with exit code 0 after 3.1399ms
07:13:06.8842647 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "uname -m" finished with exit code 0 after 2.6496ms
07:13:06.8867628 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "uname -r" finished with exit code 0 after 2.3968ms
07:13:06.8872544 [Info, Thread 82] liblinux.RemoteSystemBase: Disconnecting over SSH from "10.10.10.10:22"
07:13:06.8872544 [Info, Thread 82] liblinux.IO.RemoteFileSystemImpl: Disconnecting over SFTP from 10.10.10.10:22


Solution

  • To view SSH logging in real-time logging on the remote machine, SSH to it and:

    $ sudo journalctl -f -u ssh

    (Optional: You can set the log level (e.g. to DEBUG, or INFO) in the SSH daemon configuration file.) On Debian, the SSH daemon configuration file is here:

    /etc/ssh/sshd_config

    You'll find that Visual Studio opens several SSH sessions to the remote machine, and closes most of them almost immediately after opening them. Each time you remote-compile or remote-debug, you'll see several sessions open and quickly close. There seems to be one or more, however, that persist. They eventually time out and are shut down by the remote SSH daemon, as evidenced by one or more logged message(s):

    sshd[{*nix_process_id}]: Timeout, client not responding from user {user} {ip_address} port {random_port#} (where {*nix_process_id}, {user}, and {random_port#} are replaced by the obvious).

    It is immediately after this/these session(s) time out that Visual Studio decides it can no longer connect, (even though it can). This seems like a Visual Studio bug, but I've found no info regarding it online.

    My workaround is to set the following in the SSH daemon configuration file:

    MaxSessions 100
    TCPKeepAlive yes
    ClientAliveCountMax 3
    ClientAliveInterval 180 
    

    MaxSessions default is 10, which seems borderline, as Visual Studio seems to use at a half-dozen or more at a time. Setting ClientAliveInterval to 180 seconds causes the ssh daemon to send a null ssh packet every 180 seconds to the client, and ClientAliveCountMax sets the number of times the ssh daemon will tolerate the ssh client failing to acknowledge that null packet before timing out the session.

    TCPKeepAlive default is off. TCPKeepAlive does more-or-less the same thing as described above, and is likely redundant -- it sends a TCP packet, unencrypted, just to ensure the client firewall doesn't decide the conversation is over and close off the port.

    I'm not sure which of these mitigations is most responsible for improving the problem, and it is still not 100% solved. Visual Studio still fails to acknowledge the null packets randomly, particularly when debugging, leading to the ssh daemon on the Linux host closing the session(s) that Visual Studio seems to require to stay open. But, it is improved by an order of magnitude -- It fails less than 2% of the time now.