In Practical Byzantine Fault Tolerance(PBFT), the reason why 3f+1
is needed as the way I understand is to allow for the worst case scenario where:
1. f+1 nodes are normal
2. f nodes are unresponsive
3. f nodes are faulty
So in the PREPARE
phase, how is it possible for each node to receive 2f
similar PREPARE
messages from other nodes to start the COMMIT
phase?
Since only f+1
nodes can reliably send the same PREPARE
message, each node should only receive f
of the same PREPARE
message (not counting their own). So how is it possible for them to receive 2f
PREPARE
message to start the quorum and proceed to the next stage?
The following assertion is incorrect
1. f+1 nodes are normal
2. f nodes are unresponsive
3. f nodes are faulty
3f+1 means that PBFT tolerates up to f faulty nodes only, where faulty means unavailable, unresponsive or malicious.
PBFT is focused on satisfying safety (results are valid and identical at all nodes) and liveness (nodes that don’t fail always produce a result) properties.
To achieve liveness, there must be a non-failed quorum (Q) available. So given N nodes and f faulty nodes, you end up with
Q <= N-f
For safety, the intersection of two quorums must contain at least one non-faulty node. So given N nodes and quorum size Q, the following must be true for two intersecting quorums:
2Q - N > f => N + f < 2Q (because all f nodes can be malicious) Recall for liveness Q <= N-f, so
2(N-f) - N > f => N > 3f
Assume N = 3f + 1 and again for liveness N + f < 2Q => (3f + 1) + f < 2Q which means that the minimum quorum size for safety in the byzantine case is now 2f+1. And since a quorum must be available in the presence of f faulty nodes (2f+1+f), you get 3f+1.
There are more formal ways to demonstrate the proof, but hopefully this helps.