I have a service fabric cluster that seems to be stuck in the roll back phase of an automatic upgrade for over seven days.
This is the output from Get-ServiceFabricClusterUpgrade
:
TargetCodeVersion : 5.5.216.0
TargetConfigVersion : 2
StartTimestampUtc : 15/06/2017 23:44:40
FailureTimestampUtc : 16/06/2017 01:41:48
FailureReason : HealthCheck
UpgradeState : RollingBackInProgress
UpgradeDuration : 7.14:13:10
CurrentUpgradeDomainDuration : 7.12:16:03
CurrentUpgradeDomainProgress : 0
NodeName : xxxxxxxxxxxxxxxxxxxxx
UpgradePhase : PreUpgradeSafetyCheck
PendingSafetyChecks :
WaitForInbuildReplica - PartitionId: xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
NextUpgradeDomain : 1
UpgradeDomainsStatus : { "0" = "InProgress";
"1" = "Pending";
"2" = "Pending";
"3" = "Pending";
"4" = "Pending" }
The only other cmdlets under the Service Fabric powershell module that seem related are Start-ServiceFabricClusterUpgrade
, Resume-ServiceFabricClusterUpgrade
and Update-ServiceFabricClusterUpgrade
.
I have tried Start-ServiceFabricClusterUpgrade
with the -Force
switch hoping it would cancel the existing hanging one, and start a new one but unfortunately not. I have also restarted the node that is in progress but that has made no difference either.
In the absence of a Stop-ServiceFabricClusterUpgrade
, is there anything else I can do to stop this process?
What I did in the end was log onto the nodes in the cluster one by one and restart them, waiting for the previous one to come back up before restarting the next one.
This fixed it and the upgrade process eventually finished. The restart on the VMSS would probably have achieved the same thing, but I'm not sure whether there would have been a service outage during the restart. It certainly would have been less time consuming.