Search code examples
powershellazureazure-service-fabricportswindows-firewall

Why can't I connect to this Service Fabric cluster?


I'm blocked by an error when connecting to a remote service fabric cluster running on premise (not on Azure) using the Connect-ServiceFabricCluster PowerShell command for a network-connected virtual machine:

WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service...
WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM...
False
WARNING: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.1.102:19000
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
At Install.ps1:3 char:1
+ Connect-ServiceFabricCluster -ConnectionEndpoint "FABRICTESTSRV:19000" -WindowsCred ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
    + FullyQualifiedErrorId : TestClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster

The command is:

Connect-ServiceFabricCluster -ConnectionEndpoint "FABRICTESTSRV:19000" -WindowsCredential:$True

Why isn't it working?

Here is what I have tried:

  • I have tried turning Windows Firewall off entirely. No luck there.
  • Connecting locally to the cluster while inside of the virtual machine works just fine: Connect-ServiceFabricCluster "localhost:19000"
  • This is not a DNS issue. I can ping the FQDN of the machine just fine.

Note: This is not an Azure hosted Virtual Machine. This is simply a network-connected virtual machine running Service Fabric Core, vanilla Windows 8.1 x64 fully up to date.

Edit: Get-ServiceFabricClusterManifest reads as follows:

<ClusterManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="ComputerName-Local-Cluster" Version=
"1.0" xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <NodeTypes>
    <NodeType Name="NodeType0">
      <Endpoints>
        <ClientConnectionEndpoint Port="19000" />
        <LeaseDriverEndpoint Port="19001" />
        <ClusterConnectionEndpoint Port="19002" />
        <HttpGatewayEndpoint Port="19080" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
        <ServiceConnectionEndpoint Port="19006" />
        <ApplicationEndpoints StartPort="30001" EndPort="31000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType1">
      <Endpoints>
        <ClientConnectionEndpoint Port="19010" />
        <LeaseDriverEndpoint Port="19011" />
        <ClusterConnectionEndpoint Port="19012" />
        <HttpGatewayEndpoint Port="19082" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19083" Protocol="http" />
        <ServiceConnectionEndpoint Port="19016" />
        <ApplicationEndpoints StartPort="31001" EndPort="32000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType2">
      <Endpoints>
        <ClientConnectionEndpoint Port="19020" />
        <LeaseDriverEndpoint Port="19021" />
        <ClusterConnectionEndpoint Port="19022" />
        <HttpGatewayEndpoint Port="19084" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19085" Protocol="http" />
        <ServiceConnectionEndpoint Port="19026" />
        <ApplicationEndpoints StartPort="32001" EndPort="33000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType3">
      <Endpoints>
        <ClientConnectionEndpoint Port="19030" />
        <LeaseDriverEndpoint Port="19031" />
        <ClusterConnectionEndpoint Port="19032" />
        <HttpGatewayEndpoint Port="19086" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19087" Protocol="http" />
        <ServiceConnectionEndpoint Port="19036" />
        <ApplicationEndpoints StartPort="33001" EndPort="34000" />
      </Endpoints>
    </NodeType>
    <NodeType Name="NodeType4">
      <Endpoints>
        <ClientConnectionEndpoint Port="19040" />
        <LeaseDriverEndpoint Port="19041" />
        <ClusterConnectionEndpoint Port="19042" />
        <HttpGatewayEndpoint Port="19088" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19089" Protocol="http" />
        <ServiceConnectionEndpoint Port="19046" />
        <ApplicationEndpoints StartPort="34001" EndPort="35000" />
      </Endpoints>
    </NodeType>
  </NodeTypes>
  <Infrastructure>
    <WindowsServer IsScaleMin="true">
      <NodeList>
        <Node NodeName="_Node_0" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType0" FaultDomain="fd:/0" UpgradeDomain="0" />
        <Node NodeName="_Node_1" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType1" FaultDomain="fd:/1" UpgradeDomain="1" />
        <Node NodeName="_Node_2" IPAddressOrFQDN="localhost" IsSeedNode="true" NodeTypeRef="NodeType2" FaultDomain="fd:/2" UpgradeDomain="2" />
        <Node NodeName="_Node_3" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType3" FaultDomain="fd:/3" UpgradeDomain="3" />
        <Node NodeName="_Node_4" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType4" FaultDomain="fd:/4" UpgradeDomain="4" />
      </NodeList>
    </WindowsServer>
  </Infrastructure>
  <FabricSettings>
    <Section Name="Security">
      <Parameter Name="ClusterCredentialType" Value="None" />
      <Parameter Name="ServerAuthCredentialType" Value="None" />
    </Section>
    <Section Name="FailoverManager">
      <Parameter Name="ExpectedClusterSize" Value="4" />
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
      <Parameter Name="ReconfigurationTimeLimit" Value="20" />
      <Parameter Name="BuildReplicaTimeLimit" Value="20" />
      <Parameter Name="CreateInstanceTimeLimit" Value="20" />
      <Parameter Name="PlacementTimeLimit" Value="20" />
    </Section>
    <Section Name="ReconfigurationAgent">
      <Parameter Name="ServiceApiHealthDuration" Value="20" />
      <Parameter Name="ServiceReconfigurationApiHealthDuration" Value="20" />
      <Parameter Name="LocalHealthReportingTimerInterval" Value="5" />
      <Parameter Name="IsDeactivationInfoEnabled" Value="true" />
      <Parameter Name="RAUpgradeProgressCheckInterval" Value="3" />
    </Section>
    <Section Name="ClusterManager">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
      <Parameter Name="UpgradeStatusPollInterval" Value="5" />
      <Parameter Name="UpgradeHealthCheckInterval" Value="5" />
      <Parameter Name="FabricUpgradeHealthCheckInterval" Value="5" />
    </Section>
    <Section Name="NamingService">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
    </Section>
    <Section Name="Management">
      <Parameter Name="ImageStoreConnectionString" Value="file:C:\SfDevCluster\Data\ImageStoreShare" />
      <Parameter Name="ImageCachingEnabled" Value="false" />
      <Parameter Name="EnableDeploymentAtDataRoot" Value="true" />
    </Section>
    <Section Name="Hosting">
      <Parameter Name="EndpointProviderEnabled" Value="true" />
      <Parameter Name="RunAsPolicyEnabled" Value="true" />
      <Parameter Name="DeactivationScanInterval" Value="60" />
      <Parameter Name="DeactivationGraceInterval" Value="10" />
      <Parameter Name="EnableProcessDebugging" Value="true" />
      <Parameter Name="ServiceTypeRegistrationTimeout" Value="20" />
      <Parameter Name="CacheCleanupScanInterval" Value="300" />
    </Section>
    <Section Name="HttpGateway">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="PlacementAndLoadBalancing">
      <Parameter Name="MinLoadBalancingInterval" Value="300" />
    </Section>
    <Section Name="Federation">
      <Parameter Name="NodeIdGeneratorVersion" Value="V4" />
      <Parameter Name="UnresponsiveDuration" Value="0" />
    </Section>
    <Section Name="ApplicationGateway/Http">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="FaultAnalysisService">
      <Parameter Name="TargetReplicaSetSize" Value="3" />
      <Parameter Name="MinReplicaSetSize" Value="3" />
    </Section>
    <Section Name="Trace/Etw">
      <Parameter Name="Level" Value="4" />
    </Section>
    <Section Name="Diagnostics">
      <Parameter Name="ProducerInstances" Value="ServiceFabricEtlFile, ServiceFabricPerfCtrFolder" />
      <Parameter Name="MaxDiskQuotaInMB" Value="10240" />
    </Section>
    <Section Name="ServiceFabricEtlFile">
      <Parameter Name="ProducerType" Value="EtlFileProducer" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="EtlReadIntervalInMinutes" Value=" 5" />
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
    </Section>
    <Section Name="ServiceFabricPerfCtrFolder">
      <Parameter Name="ProducerType" Value="FolderProducer" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="FolderType" Value="ServiceFabricPerformanceCounters" />
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
    </Section>
    <Section Name="TransactionalReplicator">
      <Parameter Name="CheckpointThresholdInMB" Value="64" />
    </Section>
  </FabricSettings>
</ClusterManifest>

Solution

  • Why isn't it working?

    It is not working because you set IP address of your nodes as localhost thus making them undiscoverable. It will work for local debug cluster, but for on-premises and for Azure clusters you have to specify valid and reachable IP address or qualified name.

    Also, I'm not 100% sure right now, but I can suggest to specify FQDN instead of IP address if you want your cluster be accessible by URI and not by IP. I remember I had troubles with this, but it is still not clear what has helped — FQDN or something else.