distributed-computing distributed-system fault-tolerance

How to avoid loss of internal state of a master during fail-over to new master during a network partition

I was trying to implement a simple single master node against multiple backup nodes system to learn about distributed and fault tolerant architecture.

Currently this is what my system looks like:

N different nodes, each one identical. 1 master node running a simple webserver.
All nodes communicate with each other using simple heartbeat protocol and each maintain global state (count of nodes available, who is master, downtime and uptime of each other.)
If any node does not hear from master for some set time, if raises a alarm. If a consensus is reached that the master is down, new master is elected.
If the network of nodes gets partitioned.
- And the master is in minor partition, then it will stop serving request and go down by itself after a set period of time. Minor group cannot elect master (some minimum nodes require to make decision)
- New master gets selected in the major partition after a set time after not hearing from old master.

Now I am stuck with a problem, that is, in the step 4 above, there is a time gap where the old master is still serving the requests, while new master getting elected in the major node.

This seems can cause inconsistent data across the system if some client decided to write new data to old master. How we avoid this issue. Would be glad if someone points me to right direction.

Solution

Rather than accepting writes to the minority master, what you want is to simply reject writes to the old master in that case, and you can do so by attempting to verify its mastership with a majority of the cluster on each write. If the master is on the minority side of a partition, it will no longer be able to contact a majority of the cluster and so will not be able to acknowledge clients’ requests. This brief period of unavailability is preferable to losing acknowledged writes in quorum based systems.

You should read the Raft paper. You’re slowly moving towards an implementation of the Raft protocol, and it will probably answer many of the questions you might have alonggn the way.