Search code examples
distributeddistributed-systemconsistencyconsensusraft

RAFT: What happens when Leader change during operation


I want to understand following scenario:

  • 3 node cluster ( n1, n2, n3 )
  • Assume n1 is the leader
  • Client send operation/request to n1
  • For some reason during the operation i.e. ( appendentry, commitEntry ... ) leader changes to n2
  • But n1 has successfully written to its log

Should this be considered as failure of operation/request ? or It should return "Leader Change"


Solution

  • This is an indeterminate result because we don't know at the time if the value will be committed or not. That is, the new leader may decide to keep or overwrite the value; we just do not have the information to know.


    In some systems that I have maintained the lower-level system (the consensus layer) gives two success results for each request. It first returns Accepted when the leader puts it into its log and then Committed when the value is sufficiently replicated.

    The Accepted result means that the value may be committed later, but it may not.

    The layers that wrap the consensus layers can be a bit smarter about the return value to the client. They have a broader view of the system and may retry the requests and/or query the new leader about the value. That way the system-as-a-whole can have the customary single return value.