Search code examples
consensusraft

In RAFT algorithm, is it possible for a follower to become leader if it got disconnected from the current leader?


Consider below scenario in RAFT algorithm.

  1. A leader is currently available in the cluster.
  2. The log is up to date in all nodes.
  3. One follower gets disconnected from the leader, while all other followers are still connected to the leader and are receiving heartbeats.
  4. The follower who got disconnected becomes a candidate and starts a vote for a new term, since it didn't receive heartbeats from leader.

Would the other followers vote for the new candidate and elect it as the leader, while the previous leader is also still healthy?


Solution

  • "Would the other followers vote for the new candidate and elect it as the leader, while the previous leader is also still healthy?" the answer is yes - voting does not depend if existing leader is healthy or not.

    According to raft white paper (https://raft.github.io/raft.pdf page 4), the follower without hearing a heart beat will become a candidate - it will increase the term and will request votes. And other nodes will have to vote yes as all conditions are met. In fact, even the leader will vote yes and stop being a leader in case they receive such voting request.

    This is the sequence of events:

    • given the cluster is stable and term number is 10; three nodes A,B,C,D,E and A is the leader
    • B stops getting heartbeats from A
    • B will become the candidate at some point
    • B will increase its term to 11
    • B will initiate voting
    • all node will vote yes, because a) the term is larger then previous they saw and b) they did not yet vote in term 11 and c) the log of B is as updated as theirs => vote yes
    • when the leader sees a message with the term larger than theirs, they turn to be a follower and vote yes as well

    Raft, and many other similar protocols, do not have a concept of strong leader - which means nothing stops the cluster to swap the leader even if the leader is healthy.

    If raft protocol is implemented as the paper describes, this is a common issue that a disconnected but alive follower keep initiating new elections on every reconnect. This is happens as the disconnected but alive follower keeps becoming the candidate, increasing it terms - as they can't win the election while disconnected. So when they actually reconnect, suddenly their term is larger then the current one of the cluster, hence new election happens (they won't win it as they don't have latest logs).

    In practice, it is common to have "pre candidate" check - the candidate checks if it has connectivity to majority of nodes before bumping up its term number. This approach prevents unnecessary election when the network is not stable.