Search code examples
c#akkaakka.net

Akka.net restart actor system termination on node quarentined


We are developing a cluster with Akka.net v1.4.38 on we have seed nodes that communicate with a external system using Akka.IO.TCP, and multiple client nodes that receive and send messages to seed nodes. If a client node lost communication with the cluster we need to restart the Akka actor system on this node because is quarentined. We created a Actor that listen AssociationErrorEvent and ThisActorSystemQuarantinedEvent and restart the system when receive this messages.

public class ErrorManagerActor: ReceiveActor {
    public ErrorManagerActor(Action action) {
        Receive<ThisActorSystemQuarantinedEvent>(m => {
            action();
        });
        Receive<AssociationErrorEvent>(m => {
            action();
        });
    }
}

The problem is that the actor system never stop and show a warning in console:

[CoordinatedShutdown (akka://xxxxx)] Coordinated shutdown phase [actor-system-terminate] timed out after 00:00:10

We created a UnitTest to reproduce the problem.

    [Test]
    public void TerminateSystemTest() {
        var actor = Sys.ActorOf(Props.Create<ErrorManagerActor>(() => {
            if (!Sys.Terminate().Wait(10000))
                Assert.Fail("Unable to terminate actor system");
            terminatedEvent.Set();
        }));
        Sys.EventStream.Subscribe(actor, typeof(AssociationErrorEvent));
        Sys.EventStream.Subscribe(actor, typeof(ThisActorSystemQuarantinedEvent));
        var cluster = Cluster.Get(Sys);
        Sys.EventStream.Publish(new ThisActorSystemQuarantinedEvent(cluster.SelfAddress, cluster.SelfAddress));
        terminatedEvent.WaitOne();
    }

Solution

  • The reason why your test fails is that in order for the ActorSystem to terminate it has to kill all actors first, including the one running your test assertion. So having an actor execute a blocking Task.Wait on System.Terminate will result in a deadlock.

    To fix this in a production system, just don't wait on the Task.