Search code examples
apache-zookeeperdistributed-computing

Quorum vs majority in Zookeper


I've been reading a lot about Zookeeper and one thing that I didn't understand yet is the naming convention. I couldn't find anything about it in the documentation and all stackoverflow questions and other resources uses the definition of quorum in different ways. So my question is, what exactly a quorum is?

  1. Is it the number of servers that we must have to achieve a healthy ensemble? This is: Q=2N+1 where N is the number of nodes that could go down without bring down the whole service? (failure number that we are willing to support). In this case, the initial total number of servers of the ensemble is what we call a quorum, so if we want to make our system support at most 2 failing nodes, Q=2*2+1=5, so the quorum is equals to the number of servers?
  2. A quorum is the majority number of servers that have to agree on a certain operation in order to move forward. For example, if T is the total number of servers, then a quorum is also the majority number given by (T+1)/2.

So, what is exactly a quorum? I saw many people using the same name in two different concepts.


Solution

  • It seems like people were using it indiscriminately, but both points are mutually exclusive. One thing is the number of servers to build a healthy ensemble, resilient and reliable to get quorum even if some node fails and the other states that a quorum is equivalent to that number. Also, the definition of quorum is:

    The minimum number of members of an assembly or society that must be present at any of its meetings to make the proceedings of that meeting valid.

    Which implies that we could have more members not available at the time we need to make a decision (in this case, servers that went down). More than that, I found an interesting note in the Zookeeper internals documentation:

    Atomic broadcast and leader election use the notion of quorum to guarantee a consistent view of the system. By default, ZooKeeper uses majority quorums, which means that every voting that happens in one of these protocols requires a majority to vote on. One example is acknowledging a leader proposal: the leader can only commit once it receives an acknowledgement from a quorum of servers.

    Quorums could also be configured in a hierarchical way or event configured weights on different servers.

    Conclusion:

    1. The number of servers to configure a reliable and resilient ensemble is a health metric, you can name it as you want but it is not a quorum.
    2. A quorum is the minimum number of servers that needs to vote or send acks on every decision in the ensemble, which by default is (T+1)/2 where T stands for the total number of servers.