Search code examples
azure-service-fabric

How do I stop service fabric application upgrade?


How do I stop Azure Service Fabric compose application upgrade which is failing and never timing out?

Upgrade details below which does not have timeout set. I know what the issue with application is (no registry username/password specified) but I can not cancel current upgrade.

UPGRADE DETAILS
Name    fabric:/planet
Type Name   Compose_5
Target Application Type Version v7
Upgrade Domains
Name    State
UD0 InProgress
UD1 Pending
UD2 Pending
Upgrade State   RollingForwardInProgress
Next Upgrade Domain UD1
Rolling Upgrade Mode    UnmonitoredAuto
Upgrade Description
Name    fabric:/planet
Target Application Type Version v7
Upgrade Kind    Rolling
Rolling Upgrade Mode    UnmonitoredAuto
Upgrade Replica Set Check Timeout In Seconds    4294967295
Force Restart   false
Monitoring Policy
Failure Action  Manual
Health Check Wait Duration  0.00:00:00.0
Health Check Stable Duration    0.00:02:00.0
Health Check Retry Timeout  0.00:10:00.0
Upgrade Timeout Infinity
Upgrade Domain Timeout  Infinity
Upgrade Duration    0.00:21:01.241.0700000000652
Upgrade Domain Duration 0.00:21:01.241.0700000000652
Current Upgrade Domain Progress
Domain Name UD0
Node Upgrade Progress List
Node Name   Upgrade Phase   Pending Safety Checks
CONTAINERHOST1  Upgrading   (empty)
Start Timestamp Utc Fri, 03 Aug 2018 02:20:34 GMT
Failure Timestamp Utc   N/A
Failure Reason  None

Solution

  • Because you've set the failure mode to manual, the cluster will be stuck waiting your action.

    You could try Start-ServiceFabricApplicationRollback or Resume-ServiceFabricApplicationUpgrade to continue.

    The recomended approach to upgrade a compose is using the parameters -Monitored -FailureAction Rollback

    Start-ServiceFabricComposeDeploymentUpgrade -DeploymentName mydeployment -Compose docker-compose.yml -Monitored -FailureAction Rollback
    -Monitored -FailureAction Rollback
    

    Unless it was expected to do this manual intervention, Service Fabric should handle it by itself if the upgrade parameters are configure correctly.

    Fixing these settings might solve your problem:

    Rolling Upgrade Mode is set to UnmonitoredAuto, it automate the upgrade and failure check but does not do HealthCheck. Consider using Monitored

    Upgrade Domain Timeout and Upgrade Timeout are set to Infinity, they should have a timeout set, otherwise it will wait forever.

    Failure Action is set to manual, the upgrade is being suspended to allow you to investigate the deployment before taking any further action. Consider using Rollback instead.

    You might have to configure other parameters as well. To understand these parameters, take a look in here and here. For compose deployment, check this: