Search code examples
distributedpaxos

Why does the proposer sends an accept request with the same value it got from the acceptor?


In the second phase of the paxos algorithm, the proposer issues an accept request with the number n and the value v it got from the acceptor, if the acceptor has already chosen a value previously. My questions is why the proposer is doing this? Because once a value if chosen it is permanent and cannot be changed, so in this case the proposer is just learning the chosen value, which was sent in the response of the prepare request. Why would it asks to accept a value already accepted?


Solution

  • It is necessary that the value chosen is consistent with that proposed by the last leader else a value which had been chosen can be lost. A helpful way of thinking about it is that the new proposer chooses to collabate with the old proposer. If it does not collaborate then a contradiction can occur and we can get an inconsistency across the distributed system.

    Example:

    Consider Nodes A, B and C acting all roles of multi-paxos. Node A is leader and propose V1. Imagine that the network fails and only Nodes A and B are able to communicate and only a minimum number of messages get through for Node A to know that V1 is chosen.

    When Node A hears from Node B it knows V1 is chosen as it has a majority (Nodes A and B). It sends message to Nodes B and C to say that the value is chosen however as stated in this example no further messages get through from Node A. Node A perform business actions such as money being paid out of a bank account of amount V1. Node A then crashes.

    Node C now becomes leader and doesn't know the correct bank account balance nor anything about the fact a payment had even been suggested. Node B knows a payment was suggested in V1 but not whether it was chosen as it never heard the outcome from Node A. So Node B also doesn't know the correct bank account balance.

    The mechanism you describe is exactly how Node C comes to collaborate with the dead Node A in choosing value V1. If no further messages are lost both B and C will get into a consistent state where they agree with the amount that has been paid out of the bank account.

    Clearly if Node C was not to discover value V1 via Node B and was to propose some new value we would have a contradiction. The bank account would be corrupted as the payment V1 will not be reflected in the balance of the account on each of Nodes B and C.

    Discussion

    There is a detailed discussion about the mechanism you are asking about on my blog post which describes it the leader takeover phase.

    There are some standard implementation details assumed in the events as I describe them above. For example, one might say "don't move the money from the bank account without more messages to confirm that all nodes are aware the value is chosen". Yet Paxos is proven to only need the minimum number of messages to be safe and crashes should only be rare. This means that when implementing Paxos it is usually optimal to use the minimum number of messages during normal running and to rely on the algorithm to recover a consistent state across the system during failure scenarios.

    It is interesting that a value can be chosen yet no living node knows about it. In the above example Node A runs long enough to see the messages from Node B and move money between bank accounts. Yet it might have crashed before hearing from Node A. It will have accepted V1, in addition to Node B, yet no Node knows that the value has been chosen until Node C discovers V1 and also chooses it.

    It is interesting that the clients of the system, those things observing the bank account, or other systems receiving a payment from the bank account, are also a part of the distributed system. Were it the case that no payments were made from bank accounts in the example then loosing V1 would not be a problem. Yet it is fairly normal that there are side effects of values being chosen. The 3rd party systems, or user web browsers that observe the system, are actually part of the distributed system.