Search code examples
distributed-systemfault-toleranceraft

How a typical cluster of five servers can tolerate the failure of any two servers?


I am reading Raft-extended paper and above statement was there. Also I found a statement in the web saying failures of f servers can be tolerated if there were 2*f+1 servers. It's obvious to have another two servers where f=1. Is there an inductive way to prove it?


Solution

  • It is intuitive once you realize that for any operation to be successful, it must complete successfully on a majority of the servers. i.e. in case of a discrepancy, at least quorum number of servers must agree on the same value.

    For 2 servers, both servers must be in agreement for a majority.

    For 3 servers, at least 2 must be in agreement for a majority.

    For 4 servers, at least 3 must be in agreement for a majority.

    For 5 servers, at least 3 must be in agreement for a majority.

    i.e. for n servers, n/2 +1 servers must be in agreement.

    So for n servers, the number of servers that can afford to fail is n - (n/2+1).

    Which mean for 2n servers, it is 2n - (2n/2 + 1).

    Therefore for 2n +1 servers, it is 2n - (2n/2 + 1) + 1, which simplifies to n servers.