Search code examples
scalaakka

Why Receptionist.Subscribe first messages in Akka don't contain all cluster members registered with the specified key?


I have two actors: the first already in the cluster (all in localhost) at port 25457 and the second to be started at port 25458.

Both have the following behaviour:


    val addressBookKey = ServiceKey[Message]("address_book")
    val listingResponseAdapter = ctx.messageAdapter[Receptionist.Listing] { 
    case addressBookKey.Listing(p) => OnlinePlayers(p) }
    
    Cluster(ctx.system).manager ! Join(address)
    ctx.system.receptionist ! Register(addressBookKey, ctx.self)
    ctx.system.receptionist ! Subscribe(addressBookKey, listingResponseAdapter)
    
    Behaviors.receiveMessagePartial { 
      case m =>
        System.err.println(m)
        Behaviors.same
    }

When the second actor joins stderr prints Set(), Set(Actor[akka://system/user#0]) and then Set(Actor[akka://system/user#0], Actor[akka://system@localhost:27457/user#0])

When the second actor leaves, the first actor prints two times Set(Actor[akka://system/user#0])

How can the second actor receive directly all cluster participants?
Why the first actor prints two times the set after the second leaves?

Thanks


Solution

  • Joining the cluster is an async process, you have only just triggered joining by sending Join to the manager, actually joining the cluster is happening at some point after that. The receptionist can only know about registered services on nodes that has completed joining the cluster.

    This means that when you subscribe to the receptionist, joining the cluster has likely not completed yet, so you get the locally registered services (because of ordering guarantees the receptionist will always get the local register message before it receives the subscribe), then once joining the cluster completes the receptionist learns about services on other nodes and the subscriber is updated.

    To be sure other nodes are known you would have to wait with the subscription until the node has joined the cluster, this can be achieved by subscribing to cluster state and subscribing only after the node itself has been marked as Up.

    In general it is often good to make something that works regardless of cluster node join as it makes it easier to test and also run the same component without cluster. So for example switching behaviour of the subscribing actor when there are no registered services vs at least one, or with a minimum count of services for the service key.

    Not entirely sure about why you see the duplicated update when the actor on the other node "leaves", but there are some specifics around the CRDT used for the receptionist registry where it may have to re-remove a service for consistency, that could perhaps explain it.