Search code examples
akkaakka-cluster

Akka cluster with one master node, worker nodes and non cluster client nodes


So I am building an akka cluster with 2.6.6 and I am setting up a master node which will be the seed node and worker nodes that can dynamically leave or enter the cluster. I also have "client" nodes that should talk to the master node, possible a router but not to the workers directly.

The problem right now is that sometimes if too many workers leave due to shutdown without properly leaving the cluster the split brain downing provider unelects the master node as leader and hence shuts it down also and also right now the "client" nodes are also part of the cluster and will be shutdown as well in that case which should not happen.

Is there a way to pin the leader to the master node but still have autodowning for the workers but don't down client nodes as well?

EDIT:

Maybe a bit more structured, this is what I like to accomplish:

  • Master node never shuts down automatically, if it crashes it will restarted manually
  • Worker nodes shutdown if master node is not available
  • Non worker client nodes never shutdown, but try reconnect to master node indefinitely if master node is not available

Solution

  • Assuming you're using the now-open-sourced (formerly Lightbend commercial) split-brain resolver, the static-quorum strategy seems like a good fit.

    The decision can be based on nodes with a configured role instead of all nodes in the cluster. This can be useful when some types of nodes are more valuable than others. You might, for example, have some nodes responsible for persistent data and some nodes with stateless worker services. Then it probably more important to keep as many persistent data nodes as possible even though it means shutting down more worker nodes.

    There is another use of the role as well. By defining a role for a few (e.g. 7) stable nodes in the cluster and using that in the configuration of static-quorum you will be able to dynamically add and remove other nodes without this role and still have good decisions of what nodes to keep running and what nodes to shut down in the case of network partitions. The advantage of this approach compared to keep-majority (described below) is that you do not risk splitting the cluster into two separate clusters, i.e. a split brain*. You must still obey the rule of not starting too many nodes with this role as described above. It also suffers the risk of shutting down all nodes if there is a failure when there are not enough nodes with this role remaining in the cluster, as described above.

    This could be accomplished with the following in application.conf:

    akka.cluster.split-brain-resolver.active-strategy=static-quorum
    
    akka.cluster.split-brain-resolver.static-quorum {
      # one leader node at a time
      quorum-size = 1
      role = "leader"
    }
    
    akka.cluster.roles = [ ${AKKA_CLUSTER_ROLE} ]
    

    You would then specify the cluster role for each instance via the environment variable AKKA_CLUSTER_ROLE (setting it to leader on your leader node and worker or client as appropriate).

    Since nodes are required to agree on the SBR strategy, the best you can do is have the client nodes die if the leader goes away.

    I'll take this opportunity at the end to point out that having client nodes joining an Akka cluster is perhaps a design decision worth revisiting: it strikes me as being well on the way to being a distributed monolith. I'd hope that clients interacting with the cluster via http or a message queue was seriously considered.