Search code examples
apache-zookeeperdistributedetcdraftpaxos

How Raft know previous term log entry committed or not


When I study raft, I have a problem.

A Raft cluster has 5 servers. we call them a,b,c,d,e. a is the leader. Now everything is ok. Then, A handle a client request, makes a log entry.

scenario 1, b & c replicate the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. the log is committed.

scenario 2, b replicate the log entry, c, d, e don't. Then a & e crash. b has the log entry, c & d not. the log is not committed.

How raft handle them?


Solution

  • This statement "Then, A handle a client request, makes a log entry." should be extended to "Then, A handle a client request, A waits when at least 2 of (b,c,d,e) accept the request,[and then] makes a log entry."

    Since there are five nodes - one leader and four followers - the majority requires three nodes: the leader and any two followers.

    So, the leader adds entry to log when at least two followers accepted the request.

    When a follower accepts a request, if does not mean the request is committed. A follower will commit a request only after the leader will tell it to do so.

    Scenario 1: b & c replicate the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. the log is committed.

    From the context, "replicate" means that an entry was committed (in terms of raft). When a&b crash a new election process will have to happen. As usual, majority is needed to elect a new leader, hence all three (c, d, e) will communicate with each other.

    Raft guarantees that a node with most up to date log wins an election. In our set (c,d,e) C has the most up to date log. Hence, C will be elected as a new leader. On election, C will send out the record missing from D and E.

    Scenario 2, b replicate the log entry, c, d, e don't. Then a & e crash. b has the log entry, c & d not. the log is not committed.

    When B (and the leader) have the log entry, the record is not committed as no majority accepted the entry. On failure, new election will happen and B will win the election, same as in scenario 1.

    Few notes

    • in raft, when an election is happening a node with most up to date log wins the election
    • a record may be in two states: proposed and committed. Committed state happens only after majority of nodes have the record. Even if a crash happens after commit, at least one of remaining nodes will have the record, hence that node will win a new election
    • it is interesting to consider, what a client of a raft cluster sees: if a client sends a request to a leader, and the leader fails before returning a reply - in this case the client does not know if the record was recorded or not - this is a very important property - not knowing the outcome. This uncertainty happens as the client does not know what exactly went wrong, did the cluster failed before or after committing the request