Search code examples
azure-service-fabric

The cluster is going through a an upgrade which cannot be interrupted on servicefabric


I am getting some error that seems to not go away with service fabric:

C:\Users\pks>armclient put /subscriptions/8393a037-5d39-462d-a583-09915b4493df/resourcegroups/TestServiceFabric11/providers/Microsoft.ServiceFabric/clusters/pksservicefabric11?api-version=2016-03-01 @updatenodesga.json
{
  "error": {
    "code": "PendingClusterUpgradeCannotBeInterrupted",
    "message": "The cluster is going through a an upgrade which cannot be interrupted."
  }
}

from the resource properties the state has gone into some AutoScale mode, which i have no idea what means:

"provisioningState": "Failed",
"clusterId": "bfb52d19-238b-4046-8e35-ad95697c79b6",
"clusterCodeVersion": "5.0.135.9590",
"clusterState": "AutoScale",

If anything from the servicefabric team have a comment on what AutoScale means that would be nice? I have been able to update the resource before even with the promisioningstate failed, but the autoscale is something i havent seen before.


Solution

  • Looks like you have two questions here

    1) When is the cluster state set to "AutoScale"? Cluster State is set to "AutoScale" when ever there is a change in reliability level. See https://azure.microsoft.com/en-in/documentation/articles/service-fabric-cluster-capacity/ for details on reliability levels.

    2) Why did you get the error message - "The cluster is going through a an upgrade which cannot be interrupted." ?

    It looks like you deleted your cluster, else it would have been easier to pin point what exactly happened here, but here is what I think may have happened. Please do provide repro steps if my guess is incorrect.

    As a part of the scale up , after adding VM instances- you changed the reliability level (Silver to Gold). This prompted the SF cluster to proceed to change the target replica set sizes of the system services and mark the cluster state as "AutoScale". This particular kind of configuration upgrade is also marked as "uninterruptible" by the system, since it impacts the system services. Before this upgrade had finished, you tried to scale down the cluster by changing the reliability level (from Gold to Silver), which prompted the system to block it and raising the error message.