c#asp.net asp.net-core asp.net-core-webapi signalr-hub

Server-side SignalR connection fails after significant uptime

I've searched numerous other questions related to SignalR connections on StackOverflow, but none of them seem to apply to my specific case.

I have an application that uses a SignalR hub. A client can connect to the hub using 2 methods:

Via a .NET Core API that uses an underlying client to connect to the hub
Connecting directly to the URL of the hub

The issue I'm having is with connection using the .NET Core API (method 1). When the server-side application has been running for a significant amount of time (maybe 2 weeks), the SignalR connection that the API uses fails. Direct connection to the SignalR hub (method 2) continues to work.

Here's how connection works via the API:

.NET Core Web API

[Route("~/api/heartbeat")]
[HttpPost]
public async Task SendHeartbeat(nodeId) {
    await SignalRClient.SendHeartbeat(nodeId);
    ...
}

SignalRClient

public static class SignalRClient
{

    private static HubConnection _hubConnection;

    /// <summary>
    /// Static SignalRHub client - to ensure that a single connection to the SignalRHub is re-used,
    /// and to prevent excessive connections that cause SignalR to fail
    /// </summary>
    static SignalRClient()
    {
        string signalRHubUrl = "...someUrl";

        _hubConnection = new HubConnectionBuilder()
        .WithUrl(signalRHubUrl)
        .Build();

        _hubConnection.Closed += async (error) =>
        {
            Log.Error("SignalR hub connection was closed - reconnecting. Error message - " + error.Message);

            await Task.Delay(new Random().Next(0, 5) * 1000);
            try
            {
                Log.Error("About to reconnect");
                await _hubConnection.StartAsync();
                Log.Error("Reconnect now requested");
            }
            catch (Exception ex)
            {
                Log.Error("Failed to restart connection to SignalR hub, following a disconnection: " + ex.Message);
            }
        };

        InitializeConnection();
    }

    private static async void InitializeConnection()
    {
        try
        {
            Log.Information("Checking hub connection status");
            if (_hubConnection.State == HubConnectionState.Disconnected)
            {
                Log.Information($"Starting SignalRClient using signalRHubUrl");
                await _hubConnection.StartAsync();
                Log.Information("SignalRClient started successfully");
            }
        }
        catch (Exception ex)
        {
            Log.Error("Failed to start connection to SignalRClient : " + ex.Message + ", " + ex.InnerException.Message);
        }
    }

    public static async Task SendHeartbeat(string nodeId)
    {
        try
        {
            Log.Information("Attempting to send heartbeat to SignalRHub");
            await _hubConnection.InvokeAsync("SendNodeHeartbeatToMonitors", nodeId);
        }
        catch (Exception ex)
        {
            Log.Error($"Error when sending heartbeat to SignalRClient  for NodeId: {nodeId}. Error: {ex.Message}");
        }
    }

After uptime of about 2 weeks, the connection fails and doesn't recover, I can see an error in the log:

Error when sending transaction to SignalRClient from /api/heartbeat: The 'InvokeCoreAsync' method cannot be called if the connection is not active

I don't understand how this is happening, as I'm using the _hubConnection.Closed method in the SignalRClient to handle the case when a connection is closed, which then executes await _hubConnection.StartAsync(); to restart the connection, as shown in the code above.

The connection is regularly being closed for some reason (every 30mins), but it usually recovers the connection, and I see the following error in the log:

SignalR hub connection was closed - reconnecting. Error message - The remote party closed the WebSocket connection without completing the close handshake.

This shows that the code is successfully entering the _hubConnection.Closed method (as this is where I log that message), so it appear that the connection is usually restarted successfully.

So, why does the connection sometimes fail completely but then fail to be restarted? I'm wondering if I'm connecting to the SignalR hub in a sensible way (in particularly, I'm wondering if using a static class for the SignalRClient is a good pattern). And I'm wondering if my actual problem is all of those The remote party closed the WebSocket connection without completing the close handshake. errors? If that's the case, what could be causing those?

Any suggestions that point me in the right direction are greatly appreciated.

Solution

I encountered this same problem a few years ago, which I solved at the time by placing all calls to StartAsync in their own task. And while I could be wrong about this, my own experiments indicated that the HubConnection itself isn't reusable, and thus also needs to be recreated after a disconnect.

So essetentially I have an function called "CreateHubConnection" which does what you'd expect it to, and I have an async method to initiate server connections that looks like this:

private async Task ConnectToServer()
{
    // keep trying until we manage to connect
    while (true)
    {
        try
        {
            await CreateHubConnection();
            await this.Connection.StartAsync();
            return; // yay! connected
        }
        catch (Exception e) { /* bugger! */}
    }
}

My initial connection runs this in a new task:

this.Cancel = new CancellationTokenSource();
Task.Run(async () => await ConnectToServer(), this.Cancel.Token);

And the Connection.Closed handler also launches it in a new task:

this.Connection.Closed += async () => 
{
    try
    {
        await Task.Delay(1000); // don't want to hammer the network
        this.Cancel = new CancellationTokenSource();
        await Task.Run(async () => await ConnectToServer(), this.Cancel.Token);
    }
    catch (Exception _e) { /* give up */ }
}

I don't know why this is necessary, but calling StartAsync directly from the Closed handler seems to create some kind of deadlock inside the SignalR library. I never did track down the exact cause for this.....it could have been because my original call to StartAsync was being called by the GUI thread. Putting connections in their own threads, creating new HubConnections each time, and disposing old HubConnections that were no longer needed fixed it.

Would be very interested if someone with more knowledge of this has a better/easier solution.