Search code examples
distributed-systemconsensusraft

Does this cause a real problem when I adopt the Raft's "never commits log entries from previous terms by counting replicas" rule in this situation?


I am currently implementing the Raft consensus algorithm myself, and I meet with the following problem. Consider there are 4 nodes(A, B, C and D), so a log entry can be committed with more than 2 votes. Now we start the cluster and have Leader A elected with term = 0. Then the following things happen:

  1. Follower B/D disconnect.
  2. Leader A create LogEntry X.
  3. Leader A try to replicate to all nodes and fails eventually because only 2 nodes(A and C).
  4. Node B reconnect and timeout, it starts a election with new term = 1.
  5. Node A lost its leadership, because it received Node B's RequestVote RPC.
  6. Node B can't win the election, because it has no LogEntry X. So there are no Leader in the cluster.
  7. Node A timeout and be elected as Leader again.
  8. Leader A successfully replicate LogEntry X to B.
  9. Now node A/B/C have exactly the same LogEntry X, which is (index = 0, term = 0). However, according to the Raft paper, Leader A can't commit X, though it's generated by itself and a majority agreed on X.

    Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas;

  10. Suppose there are no more LogEntrys from client to replicate, so LogEntry X remains uncommitted.

My questions are:

  1. Is this a real problem?
  2. Are there some solutions to this? In fact there are already some posts over SoF which emphasize the importance of this rule. And in this post, it seems to say we can create a copy Y of X, and replicate Y. Does it work or maybe there exists a common solution?

Solution

    1. No

    2. Yes. In raft paper, on page 13., you have the following:

    The Leader Completeness Property guarantees that a leader has all committed entries, but at the start of its term, it may not know which those are. To find out, it needs to commit an entry from its term. Raft handles this by having each leader commit a blank no-op entry into the log at the start of its term

    In your case, after step 7., A will create a NoOp Log entry and it will succeed in replicating it, commit it and thus all previous entries will be committed.