Search code examples
apache-zookeeperfault-tolerance

what does Zookeeper fault tolerant exactly mean ? simultaneously Or accumulatively?


As mentioned in the ZooKeeper Getting Started Guide , a minimum of three servers are required for a fault tolerant clustered setup, and it is strongly recommended that you have an odd number of servers.

So If I got 5 servers, and as mentioned above I can still survive when 2 of them failed.But It means simultaneously Or accumulatively ??

So how about this :
5 servers -> fail one -> 4 servers -> fail one -> 3 servers -> fail one -> 2 servers -> fail one -> die

And what's the difference between 3 servers(initialization) and 3 servers (degeneration from 5 servers) ??


Solution

  • For Zookeeper cluster to work, it needs quorum. And quorum is the majority of servers from the cluster.

    • With a 3 node cluster, the majority is 2 nodes. So you can tolerate only 1 node not being in sync at the same time.
    • With a 5 node cluster, the majority is 3 nodes. So you can tolerate only 2 nodes not being in sync at the same time.
    • With a 7 node cluster, the majority is 4 nodes. So you can tolerate only 3 nodes not being in sync at the same time.

    What does being in sync mean? The node is not part of the quorum not only when it is not running. But also when it is still rejoining the cluster after a failure.

    The nodes are hardcoded in Zookeeper configuration. So each node in the cluster know that it should be part of a cluster with N nodes. Therefore it doesn't work in the way that a 7 node cluster where two nodes are down is suddenly a 5 node cluster and another 2 nodes can go down. It will always behave as a 7 node cluster and only 3 nodes can go down unless you change the configuration files.

    The whole thing about even and odd number of nodes is basically about the number of nodes which could be down while maintaining the quorum. And with 4 node cluster, the majority will be 3. So 4 node cluster can still tolerate only 1 node being down. Hence it doesn't make much sense to use 4 node cluster which has the same fault tolerance as the 3 node cluster.