hyperledger-fabric blockchain fault-tolerance

How does each backup/nodes get 2f replies in PBFT?

In Practical Byzantine Fault Tolerance(PBFT), the reason why 3f+1 is needed as the way I understand is to allow for the worst case scenario where:

1. f+1 nodes are normal
2. f nodes are unresponsive
3. f nodes are faulty

So in the PREPARE phase, how is it possible for each node to receive 2f similar PREPARE messages from other nodes to start the COMMIT phase?

Since only f+1 nodes can reliably send the same PREPARE message, each node should only receive f of the same PREPARE message (not counting their own). So how is it possible for them to receive 2f PREPARE message to start the quorum and proceed to the next stage?

Solution

The following assertion is incorrect

1. f+1 nodes are normal
2. f nodes are unresponsive
3. f nodes are faulty

3f+1 means that PBFT tolerates up to f faulty nodes only, where faulty means unavailable, unresponsive or malicious.

PBFT is focused on satisfying safety (results are valid and identical at all nodes) and liveness (nodes that don’t fail always produce a result) properties.

To achieve liveness, there must be a non-failed quorum (Q) available. So given N nodes and f faulty nodes, you end up with

Q <= N-f

For safety, the intersection of two quorums must contain at least one non-faulty node. So given N nodes and quorum size Q, the following must be true for two intersecting quorums:

2Q - N > f => N + f < 2Q (because all f nodes can be malicious) Recall for liveness Q <= N-f, so

2(N-f) - N > f => N > 3f

Assume N = 3f + 1 and again for liveness N + f < 2Q => (3f + 1) + f < 2Q which means that the minimum quorum size for safety in the byzantine case is now 2f+1. And since a quorum must be available in the presence of f faulty nodes (2f+1+f), you get 3f+1.

There are more formal ways to demonstrate the proof, but hopefully this helps.