algorithm protocols distributed-system paxos

How does Paxos handle packet loss and new node joining?

Recently I'm learning Paxos, until now I already have a basic understanding of how it works. But can anyone explain how Paxos handles packet loss and a new node joining? Could be better if a simple example is provided.

Solution

The classical Paxos algorithm does not have a concept of "new nodes joining". Some Paoxs variants do, such as "Vertical Paxos", but the classic algorithm requires that all nodes be statically defined before running the algorithm. With respect to packet loss, Paxos uses a very simple infinite loop: "try a round of the algorithm, if anything at all goes wrong, try another round". So if too many packets are lost in the 1st attempt at achieving resolution (which can be detected via a simple timeout on waiting for replies), a second round can be attempted. If the timeout for that round expires, try again, and so on.

Exactly how packet loss is to be detected and handled is something the Paxos algorithm leaves undefined. It's an implementation-specific detail. This is actually a good thing for production environments since how this is handled can have a pretty big performance impact on Paxos-based systems.