I am trying to use the FabricClient API in order to simulate a graceful failure (like partition/replica/instance restart), but for some reason the service keeps recovering.
The only time where it finally succeeds is when I manually delete the service from the Cluster UI, and then I see it is stuck since RunAsyc is stuck. (I have written a special dummy service which doesn't honor the cancellation token.)
These are my attempts:
foreach (var service in Services)
{
var partitions = FabricClient.QueryManager.GetPartitionListAsync(service.ServiceName).Result;
foreach (var partition in partitions)
{
var operationGuid = Guid.NewGuid();
restartOperationsIds.Add(operationGuid);
var partitionId = partition.PartitionInformation.Id;
FabricClient.FaultManager.RestartReplicaAsync(
ReplicaSelector.PrimaryOf(PartitionSelector.PartitionIdOf(service.ServiceName, partitionId)),
CompletionMode.Verify, CancellationToken.None);
FabricClient.TestManager.StartPartitionRestartAsync(operationGuid,
PartitionSelector.PartitionIdOf(service.ServiceName, partitionId),
RestartPartitionMode.AllReplicasOrInstances, TimeSpan.FromMinutes(2));
}
}
RestartReplicaAsync doesn't do anything it seems, while StartPartitionRestartAsync causes the service to appear to restart, but then it succeeds again.
The cancellation token is cancelled in a few scenarios, and most these scenarios are mainly for maintenance reasons, they might be:
There are some other events where the services are forcefully shutdown, and the token is not called, an example is when you call Restart-ServiceFabricDeployedCodePackage
Restart-ServiceFabricPartition
or Restart-ServiceFabricNode