Search code examples
distributed-computingdistributeddistributed-systemconsensuspaxos

Commit Failure in Paxos


I am new to the Distributed System and Consensus Algorithm. I understand how it works but I am confused by some corner cases: when the acceptors received an ACCEPT for an instance but never heard back about what the final consensus or decision is, what will the acceptors react. For example, the proposer is robooting or failed during commit or right after it sends all the ACCEPT. What will happen in this case?

Thanks.


Solution

  • There are two parts to this question: How do the acceptors react to new proposals? and How do acceptors react if they never learn the result?

    In plain-old paxos, the acceptors never actually need to know the result. In fact it is perfectly reasonable that different acceptors have different values in their memory, never knowing if the value they have is the committed value.

    The real point of paxos is to deal with the first question. And seeing that the acceptor never actually knows if it has the committed value, it has to assume that it could have the committed but be open to replacing its value if it doesn't have the committed value. How does it know? When receiving a message the proposer always compares the round number and if that is old then the acceptor signals to the proposer that it has to "catch up" first (a Nack). Otherwise, it trusts that the proposer knows what it is doing.


    Now for a word about real systems. Some real paxos systems can get away with the acceptors not caring what the committed value is: Paxos is just there to choose what the value will be. But many real systems use Paxos & Friends to make redundant copies of the data for safekeeping.

    Some paxos systems will continue paxos-ing until all the acceptors have the data. (Notice that without interference from other proposers, an extra paxos round copies the committed value everywhere.) Others systems are wary about interference from other proposers and will use a different Committed message that teach the acceptors (and other Learners) what the committed value is.

    But what happens if the proposer crashes? A subsequent proposer can come along and propose a no-op value. If the subsequent proposer Prepares (Phase 1A) and can communicate with ANY of the acceptors that the prior proposer successfully sent Accepts to (Phase 2A) then it will know what the prior proposer was trying to do (via the response in Phase 1B: PrepareAck). Otherwise a harmless no-op value gets committed.