I have a consul cluster which normally should have 5 servers and a bunch of clients. Our script to start the servers originally configured like this
consul agent -server -bootstrap-expect 5 -join <ips of all 5 servers>
However, we had to reOS all servers and bootstrap again -- one of our servers was down with hardware issues and the bootstrap no longer works.
My question is -- in a situation where there are 5 servers, but 3 are sufficient for quorum, should -bootstrap-expect be set to 3?
The documentation here https://www.consul.io/docs/agent/options.html#_bootstrap_expect seems to imply that -bootstrap-expect should be set to the total number of servers which means that even a single machine being down will prevent the cluster from bootstrapping
To be clear our startup scripts are static files, so when I say there are 5 servers it means that up to 5 could be started with the server tag.
In your case, if you don't explicitly need all 5 servers to be online during initial cluster setup, you should set -bootstrap-expect
to 3. This will avoid situations similar to what happened i.e. you have 5 servers and you tell them they must wait for all 5 to be online, for initial cluster setup. As documentation suggests:
When provided, Consul waits until the specified number of servers are available and then bootstraps the cluster. This allows an initial leader to be elected automatically.
With --bootstrap-expect=3
as soon as 3 of your 5 Consul servers have joined cluster, the leader election will start, and in case last 2 join much later, cluster will function. And for that matter you can have any number of servers join at later time.