Environment: scala-2.11.x
, akka-2.5.9
There are two hosts on EC2: H1
and H2
.
There are tree modules of sbt-project: master
, client
and worker
.
Each module implement akka-cluster node, which subscribes to a cluster events and logs them. Also each node logs a cluster state every 1 minute (for debug). The following ports are used for cluster-nodes:master: 2551
, worker: 3000
, client: 5000
The project available at github
The more details about infrastructure: my previous question
A module can be redeployed in H1
or H2
randomly.
There is a strange behavior of the akka-cluster. When one of nodes (for example worker
) is redeployed. The following steps illustrate a history of deploying:
The initial state - when worker
is deployed on H1
and master
and client
are deployed on H2
----[state-of-deploying-0]---
H1 = [worker]
H2 = [master, client]
cluster status: // cluster works correctly
Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up)
Member(address = akka.tcp://ClusterSystem@H2:2551, status = Up)
Member(address = akka.tcp://ClusterSystem@H2:5000, status = Up)
----------------
After that the worker
module has been redeployed on host H2
----[state-of-deploying-1]---
H1 = [-]
H2 = [master, client, worker (Redeployed)]
cluster status: // WRONG cluster state!
Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up) // ???
Member(address = akka.tcp://ClusterSystem@H2:2551, status = Up)
Member(address = akka.tcp://ClusterSystem@H2:3000, status = WeaklyUp)
Member(address = akka.tcp://ClusterSystem@H2:5000, status = Up)
----------------
The above situation happens occasionally. In this case a cluster stores a wrong state of membership and will not repair it:
Member(address = akka.tcp://ClusterSystem@H1:3000, status = Up) // ???
The host H1
doesn't contain any instances of worker
. And > telnet H1 3000
returns connection refused
.
But why does the akka-cluster keep storing this wrong info?
This behaviour is intended, and Akka cluster in production should be run with no automatic downing, to prevent split-brain problems.
Imagine a two node (A and B) cluster with two client (X and Y):
Y
connected to B
can be forwarded to Actor1
running on A
for processing through remotingA
becomes unreachable from B
, B
might be tempted to mark A
as down and restart Actor1
locally.Client Y
sends messages to Actor1 running on B
Client X
on the other side of network partitioning is still connected to A
and will send messages to Actor1
This is a split-brain problem: the same actor with the same identifier is running on two nodes with two different states. It is very hard or impossible to rebuild the correct actor state.
To prevent this to happen, you have to pick up a reasonable downing strategy for your problem or use case: