Search code examples
distributed-computingdistributedpaxos

Is keep logging messages in group communication service or paxos practical?


In the case of network partition or node crash, most of the distributed atomic broadcast protocols (like Extended Virtual Synchrony or Paxos), require running nodes, to keep logging messages, until the crashed or partitioned node rejoins the cluster. When a node rejoins the cluster, replay of logged messages are enough to regain the current state.

My question is, if the partitioned/crash node takes really long time to join the cluster again, then eventually logs will overflow. This seem to be a very practical issue, but still no one in their paper talks about it. Is there a very obvious solution to this which I am missing? Or my understanding in incorrect.


Solution

  • You don't really need to remember the whole log. Imagine for example that the state you were synchronizing between the nodes was something like an SQL table with a row of the form (id: int, name: string) and the commands that would be written into the logs were in a form "insert row with id=x and name=y", "delete row where id=z", "set name=a where id=1000",...

    Once such commands were committed, all you really care about is the final table. Then once a node which was offline for a long time goes online, it would only need to download the table + few entries from the log that were committed while the table was being downloaded.

    This is called "log compaction", check out the chapter 7 in the Raft paper for more info.