I guess the main purpose of a cluster is failure tolerance. However, when I start the following consul cluster, it is not the case and I don't understand why.
version: "3.5"
services:
# docker network create --driver=bridge discovery-network
# SERVICE DISCOVERY
consul-server-0:
image: consul:1.6.0
container_name: consul-server-0
command: "agent -server -bootstrap-expect 2 -client 0.0.0.0 -datacenter datacenter-1 -node consul-server-0"
networks:
- discovery-network
consul-server-1:
image: consul:1.6.0
container_name: consul-server-1
command: "agent -server -retry-join consul-server-0 -client 0.0.0.0 -datacenter datacenter-1 -node consul-server-1"
networks:
- discovery-network
depends_on:
- consul-server-0
consul-client-1:
image: consul:1.6.0
container_name: consul-client-1
command: "agent -retry-join consul-server-0 -ui -client 0.0.0.0 -datacenter datacenter-1 -node consul-client-1"
ports:
- "8500:8500" # GUI
networks:
- discovery-network
depends_on:
- consul-server-0
networks:
discovery-network:
external:true
When I stop one of servers, the cluster does not work anymore. I am unable to register anymore service (through consul-client).
In the remaining server's logs, I can see the message Failed to make RequestVote RPC
In the client's logs, I can see the message No cluster leader
What is wrong with my configuration?
The thing with consul is that at some point it wants to reach a quorum to be able to do proper leadership elections. For your servers you're using the -bootstrap-expect 2
to essentially tell the server to expect two nodes before starting the leadership election.
If you only have 2 nodes, and one is failing(or broken) you'll end up with a split brain situation. The node that is left over, doesn't have enough other nodes left to be able to decide who should be the leader inside the cluster. As a result it will not accept any new registrations.
My expectation is, that if you have a cluster of 3 nodes, and one fails, it should be able to continue running. Generally with cluster setups an un-even number of nodes is a good idea and generally (clustersize/2) >= 2