Search code examples
azure-service-fabricservice-fabric-stateless

Service Fabric ServicePartitionResolver.ResolveAsync appears to ignore load balancer


I have a Stateless service which acts as a Gateway for all requests into my 5 Node cluster

This service forwards requests onto services within the cluster

    protected virtual async Task<ResolvedServicePartition> FindPartitionAsync(long key = 0)
    {
        var resolver = ServicePartitionResolver.GetDefault();
        var result = await resolver.ResolveAsync(FullServiceName, ServicePartitionKey.Singleton, CancellationToken.None).ConfigureAwait(false);
        return result;
    }

    private async Task<string> EstablishProxyUrlAsync(string method, long key = 0)
    {
        var partition = await FindPartitionAsync(key).ConfigureAwait(false);

        if (key != 0)
        {
            Log.Information($"{this.GetType().Name} method {method} request resolved by partition {partition.Info.Id}");
        }

        var endpoints = JObject.Parse(partition.GetEndpoint().Address)["Endpoints"];
        var address = endpoints[""].ToString().TrimEnd('/');

        var proxyUrl = $"{address}/api/{Area}/{method}";

        return proxyUrl;
    }

I have a suspicion that if I have a service - TestService that is on all 5 nodes of my cluster, the code above ignores the load balancer so the request simply goes to the instance on the node that picked up the request

Is there any way to fix this?

Do I need to implement my own load balancer then? All calls from the outside come into the gateway as that seemed to be the recommended way, I.e a single point of entry. However it appears as though that concept is now going to slow things down and put more load on a specific node as there is no load balancer to pick the best node. Eg if I have a gateway method GetCars which calls GetCars on a stateless service that is across all 5 nodes, I want a way of load balancing to one of those nodes not all requests to go to the local instance

Paul


Solution

  • I'd expect the resolved endpoint address to contain the internal IP address of the node that hosts the primary replica of TestService and the port that its listener uses.

    • For a stateful service, this can only ever be a single endpoint.

    • For a Singleton service, you'll get a cached result from the ServicePartitionResolver.

    You can force a refresh, using resolver.ResolveAsync() that has an overload that takes the earlier ResolvedServicePartition.

    Also, as internal calls are not made over the internet, the call will not be passing through the (Azure) load balancer.

    Added more info:

    Likely you'll run the gateway on all nodes. If not, make sure you do that. As every gateway has its own resolver that resolves to a 'random' instance, you should see that load will then be spread across the downstream services automatically.

    P.S. have a look at Traefik, it could help solve this problem for you without you having to build a solid reverse proxy.

    more info here and here