Search code examples
azure-service-fabric

How to repair bad IP addresses in standalone Service Fabric cluster


We've just shipped a standalone service fabric cluster to a customer site with a misconfiguration. Our setup:

  • Service Fabric 6.4
  • 2 Windows servers, each running 3 Hyper-V virtual machines that host the cluster

We configured the cluster locally using static IP addresses for the nodes. When the servers arrived, the IP addresses of the Hyper-V machines were changed to conform to the customer's available IP addresses. Now we can't connect to the cluster, since every IP in the clusterConfig is wrong. Is there any way we can recover from this without re-installing the cluster? We'd prefer to keep the new IP's assigned to the VM's if possible.


Solution

  • I've tested this only on my test environment (I've never done this on production before so do it on your own risk), but since you can't connect to the cluster anyway I think it is worth to try.

    Connect to each virtual machine which is a part of the cluster and do following steps:

    1. Locate Service Fabric Cluster files (usually C:\ProgramData\SF\{nodeName}\Fabric)
    2. Take ClusterManifest.current.xml file and copy it to temp folder (for example C:\temp)
    3. Go to Fabric.Data subfolder, take InfrastructureManifest.xml file and copy it to the same temp folder
    4. Inside each file you have copied change IP addresses for nodes to correct values
    5. Stop FabricHostSvc process by running net stop FabricHostSvc command in powershell
    6. After successful stop run this powershell (admin mode) command to update node cluster configuration: New-ServiceFabricNodeConfiguration -ClusterManifestPath C:\temp\ClusterManifest.current.xml -InfrastructureManifestPath C:\temp\InfrastructureManifest.xml

    7. Once the config is updated start FabricHostSvc net start FabricHostSvc

    Do this for each node and pray for the best.