Httpcontext is lost when using a Websockets with .net Core Cluster

I've been looking for a solution to this problem for a long time and now I'm asking here. Maybe someone here has a similar problem and can give me some advice.

Regarding the configuration:

We have three Windows servers that we operate ourselves. On these servers we host various C# APIs with .Net 8 and IIS. All three servers are configured identically. A RabbitMQ cluster is also hosted here.

Cloudflare is used as a load balancer, but without session affinity because it requires a proxy and we don't want to use that. So far so good.

Now to the problem:

The API that the RabbitMQ cluster uses connects to the local RabbitMQ instance via localhost using Rebus. As far as I understand, the messages in the cluster are automatically distributed to all nodes.

[...]

services
  .AddSignalR( _ => {
    _.EnableDetailedErrors = true;
  } )
  .AddMessagePackProtocol( _ => {
    _.SerializerOptions = MessagePackSerializerOptions.Standard
      .WithSecurity( MessagePackSecurity.UntrustedData );
   } )
  .AddRebusBackplane<myhub>(); // changed for post

[...]

string rabbitMqConnectionString = $"amqp://rabbit:carrot@localhost:5672/myvhost"; // changed for post

services.AddRebus( configure => configure
  .Transport( x => {
    x.UseRabbitMq( rabbitMqConnectionString, GenerateTransientQueueName( "queuename" ) )
      .InputQueueOptions( o => {
        o.SetAutoDelete( true );
        o.SetDurable( false );
      } );
  } ) );

services.AddSingleton<IHttpContextAccessor, HttpContextAccessor>();

[...]

app.UseEndpoints( endpoints => {
  endpoints.MapControllers();
  endpoints.MapHub<myhub>( "/myhub" ); // changed for post
} );

[...]

The API is accessed via a NextJS app via a hub with Signalr. If access is now made via the load balancer, then after a few refreshes of the page the Httpcontext in the Httpcontextaccessor is suddenly zero. This also happens if the target host is not changed.

[...]
 
_httpContextAccessor = httpContextAccessor;
_context = context;
_securityService = securityService;

Claim[]? claims = _httpContextAccessor.HttpContext?.User.Claims.ToArray();

if (_httpContextAccessor == null) throw new Exception( "Httpcontextaccessor is null" );
if (_httpContextAccessor.HttpContext == null) throw new Exception( "Httpcontextaccessor => Httpcontext is null" ); // <-- This exception throws
if (_httpContextAccessor.HttpContext.User == null) throw new Exception( "Httpcontextaccessor => Httpcontext => User is null" );

if (claims == null) {
  throw new Exception( "Cant access Http Context!" );
}

[...]

Why do I think it's Rebus or RabbitMQ? The other APIs work the same way, but we don't use Rebus, RabbitMQ or SignalR here and we don't have this problem.

Maybe I'm looking in the wrong place?! I would be very grateful for any ideas.

As soon as the load balancer is no longer used, the behavior no longer exists.

I checked the RabbitMQ logs and could not find anything unusual. I also checked the RabbitMQ management interface. Here you can clearly see that all three servers (APIs) are connected.

It seems as if the servers are not receiving all messages or are not exchanging HUB IDs with each other.

UPDATE:

It seems to be the negotiate request in the SignalR client. The load balancer obviously returns a different server after the negotiation. This will probably happen if the load balancer thinks it has to redirect the traffic because too many requests are arriving at once and the sticky session is not enabled.

I just found this article. Why I didn't notice it before is a mystery to me. https://seangrimes.dev/post/load-balance-signalr-no-sticky-sessions/

By turning off negotiate, I was no longer able to reproduce the error when the proxy was turned on.

const connection = new HubConnectionBuilder()
.withUrl(`${process.env.NEXT_PUBLIC_API_BASEURI}/${global.endpoints.myhub}`, {
    accessTokenFactory: () => mytoken,
    skipNegotiation: true, // <-- added
    transport: HttpTransportType.WebSockets, // <-- added
})
.withHubProtocol(new MessagePackHubProtocol())
.withAutomaticReconnect()
.build();

You should, however, read Microsoft's security advice before simply turning off negotiate. See here: https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-concept-client-negotiation

Solution

I was able to trace the problem back to the Cloudflare proxy. As suspected, it is because the proxy returns a different server due to multiple requests in order to distribute the load. In this case, the Negotiate negotiation is carried out to obtain a connection ID and this connection ID can be clearly assigned to a server. Because the connection ID is unknown on the other servers, the connection is rejected by the API and the HttpContext in the HttpContext accessor is null.

There are two solutions to the problem. Depending on the application, one of them can be used.

The first solution is to activate a sticky session. Sticky sessions work differently depending on the provider.

The second solution is to switch off the Negotiate request. However, as I chose in my post, the security aspect must be taken into account here (https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-concept-client-negotiation).