Search code examples
distributed-computingconsensus

after consensus is achieved among n=3f+1 replicas, why client need to look at just f+1 messages with same content?


I've read in many research papers that after distributed consensus is achieved among n=3f+1 replicas, where f is the number of faulty replicas tolerated, the client just needs to take as the correct output the one contained in at least f+1 messages. What happens if byzantine replicas are f+1? Of course consensus cannot be achieved, but the f+1 byzantine nodes are still able to send a message to the client. What am I missing?


Solution

  • The safety of the algorithm is not guaranteed when there are more than f faulty nodes. When there are f+1 or more faulty nodes, they can indeed force the client to accept arbitrary response.

    The client only needs to receive f+1 responses because a non-faulty node won't proceed to commit a message unless it receives 2f+1 COMMIT messages (including its own) with the same message digest, view and sequence number. This means there is at least one "correct" response in f+1 responses the client receives because there has to be one node overlapped between any f+1 and any 2f+1 set of nodes.

    For example, if there are 7 nodes and f = 2, a non-faulty node can't commit unless it receives 4 other COMMIT messages that match the same signature. Assuming the primary is non-faulty, you can't select 3 nodes (f+1) without having at least one of them being non-faulty. This means the client can't receive f+1 faulty responses.