Search code examples
azure-service-fabricservice-fabric-on-premises

Connect to On Premises Service Fabric Cluster


I've followed the steps from Microsoft to create a Multi-Node On-Premises Service Fabric cluster. I've deployed a stateless app to the cluster and it seems to be working fine. When I have been connecting to the cluster I have used the IP Address of one of the nodes. Doing that, I can connect via Powershell using Connect-ServiceFabricCluster nodename:19000 and I can connect to the Service Fabric Explorer website (http://nodename:19080/explorer/index.html).

The examples online suggest that if I hosted in Azure I can connect to http://mycluster.eastus.cloudapp.azure.com:19000 and it resolves, however I can't work out what the equivalent is on my local. I tried connecting to my sample cluster: Connect-ServiceFabricCluster sampleCluster.domain.local:19000 but that returns:

WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service... WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM... False WARNING: No such host is known

Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.

Am I missing something in my setup? Should there be a central DNS entry somewhere that allows me to connect to the cluster? Or am I trying to do something that isn't supported On-Premises?


Solution

  • Yup, you're missing a load balancer.

    This is the best resource I could find to help, I'll paste relevant contents in the event of it becoming unavailable.

    Reverse Proxy — When you provision a Service Fabric cluster, you have an option of installing Reverse Proxy on each of the nodes on the cluster. It performs the service resolution on the client’s behalf and forwards the request to the correct node which contains the application. In majority of the cases, services running on the Service Fabric run only on the subset of the nodes. Since the load balancer will not know which nodes contain the requested service, the client libraries will have to wrap the requests in a retry-loop to resolve service endpoints. Using Reverse Proxy will address the issue since it runs on each node and will know exactly on what nodes is the service running on. Clients outside the cluster can reach the services running inside the cluster via Reverse Proxy without any additional configuration.

    Typical Service Fabric Cluster Infrastructure Diagram

    Source: Azure Service Fabric is amazing

    I have an Azure Service Fabric resource running, but the same rules apply. As the article states, you'll need a reverse proxy/load balancer to resolve not only what nodes are running the API, but also to balance the load between the nodes running that API. So, health probes are necessary too so that the load balancer knows which nodes are viable options for sending traffic to.

    As an example, Azure creates 2 rules off the bat: 1. LBHttpRule on TCP/19080 with a TCP probe on port 19080 every 5 seconds with a 2 count error threshold. 2. LBRule on TCP/19000 with a TCP probe on port 19000 every 5 seconds with a 2 count error threshold.

    What you need to add to make this forward-facing is a rule where you forward port 80 to your service http port. Then the health probe can be an http probe that hits a path to test a 200 return.

    Once you get into the cluster, you can resolve the services normally and SF will take care of availability.

    In Azure-land, this is abstracted again to using something like API Management to further reverse proxy it to SSL. What a mess but it works.

    Once your load balancer is set up, you'll have a single IP to hit for management, publishing, and regular traffic.