architecture high-availability leader-election

Is leader election used for active/partial standby use cases for microservice replicas?

For microservice HA, I know there's the "Active/Active" configuration as well as the "Active/Standby" configuration.

I'm wondering if you might use Leader Election for something in the middle. For example, say there are 3 microservices which all share the load of processing messages off a bus and storing them in a database. But only one of them is needed to do daily database purging. Might you use leader election to determine which one should do that task?

Or is that always a bad design and should you use a separate application for that instead?

Solution

Leaderless

In distributed systems where multiple nodes are running identically then we are calling this environment as leaderless. As the name suggests there no leader to follow. Each node can perform the same set of operations.

Leader-Follower

If there is/are special operation(s) which needs to be run on a single node then have multiple options:

Introduce leader-follower
Put this operation into a separate service
Make operation idempotent (rerun will not cause any unwanted side-effect)
etc.

Let's examine the first option. How can you guarantee that only one node can execute the operation?

Use a queuing system
- where each message can be fetched by a single consumer
Acquire an exclusive lock
Have a consensus protocol*
- to agree on who will be the leader
etc.

*Leader election is a specialised consensus protocol

Leader election is a complex problem, because you have to solve a lots of problems like:

How to detect that a leader is failing?
How to make sure that there is only one leader in a cluster?
- If there can be multiple ones then we call that state: split brain
How do you handle network partitioning?
How to make sure that the system will reach to a consensus in a definite time?
etc.

These questions are more or less solved in Paxos, Raft and ZAB protocols.

Back to your use case

I'm not saying that leader election is a bad idea here, but it is certainly makes your system more complex. And it is hard to implement it in a correct way.

I also want to emphasize that LE shines in those use cases where you have a complex flow/logic where you need to have a coordinator (like data replication).