Search code examples
algorithmclouddistributed-systemconsistencypaxos

Why is it legit to take the next two commands to fill gaps between paxos events?


There is a point in Paxos algorithm (http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) that I do not understand. It's about how to deal with the gaps, the paper describe two ways as below:

The leader, as well as any other server that learns all the commands the leader knows, can now execute commands 1–135. However, it can’t execute commands 138–140, which it also knows, because commands 136 and 137 have yet to be chosen. The leader could take the next two commands requested by clients to be commands 136 and 137. Instead, we let it fill the gap immediately by proposing, as commands 136 and 137, a special “no- op” command that leaves the state unchanged. (It does this by executing phase 2 of instances 136 and 137 of the consensus algorithm.) Once these no-op commands have been chosen, commands 138–140 can be executed.

  1. take the next two commands requested by clients
  2. special “no- op” command

The second option has been mentioned Why is it legit to use no-op to fill gaps between paxos events.

And My question is about the first one. In my opinion, take the next two commands will violate the consistency, since the instance happened later may be have a smaller sequence number. So why it is still legit?


Solution

  • Since all clients see the same consistent outcome there isn't a violation of consistency. So there isn't a violation of the algorithms invariants.

    If you consider the scenario where all the commands come from a single client then it would be a reordering compared to the order the client sent the values. If a single client is multi-threaded and if it streams multiple concurrent requests the reordering may be harmless (or not, depending on the application semantics). If you consider that the leader uses noops then it effectively just drops some messages which may not be harmless to a client that depends on the ordering of values it streams. It depends on the application.

    If you consider the scenario where all the values come from different clients then the situation is far more natural. Under averse conditions some reordering occurs. Yet under normal running that doesn't happen. The reordering it just looks like some values "took longer than normal" to be fixed by a leader while later values issued by other clients "ran faster".