Search code examples
azurehttpazure-service-fabricazure-load-balancer

HTTPS request always goes to the same node in Azure Service Fabric VM Scale Set


I have a service running in Azure Service Fabric that is exposed to the world using Azure Load Balancer (which has a public IP).

When I send 1000 requests from a machine within a span of 3 minutes, all the requests are routed to the same node. I expect them to be distributed across all 5 nodes of my VM scale set.

I didn't put any session persistence settings in my load balancer. In the below link there is idle timeout of 4 minutes by default on the load balancer. Is this causing all my requests to go to the same node?

https://azure.microsoft.com/en-us/blog/new-configurable-idle-timeout-for-azure-load-balancer/


Solution

  • On Azure LB, whenever you make a connection between the client and the service, the load balancer will stick that connection routed to the same server\service. The idle timeout will tell how long the connection has to keep idle before it considers opening a new connection. A new connection will likely redirect you to another server\service, and if any other is available, it might also connect to the same.

    Because you are likely using the same connection to send all these requests, the load balancer understands that it's coming from the same client and targeted to the same service it was connected to. It keeps the existing connection alive.

    Reducing this time is only recommended if you have a good reason for this, because creating new connections will add latency to your communication and may impact the network performance. This is probably the reason they limit the minimum to 4 minutes.

    The load balancer distribution mode uses 2 or 5 tuples to keep these 'sticky' connections. You might check if yours is using the 5-tuple (the default). On 5-tuple configurations, the LB will also consider the client port when a connection is open.

    If there are only one client making these requests, you have to manage multiple connections from a single process, and each connection will use a different port.

    If you use multiple clients, this shouldn't be a problem, but if these tests came from multiple clients, you have to confirm if they are not reusing the same connection.

    On dot net, you might have to tweak the configurations on ServicePoint and ServicePointManager classes.

    You might also want to take a look on this blog post: how-to-fix-load-balancer-not-working-in-round-robin-fashion-for-your-cloud-service. It is for cloud services but uses a similar approach.