I have an AWS launch configuration for a Consul cluster. Up until now it ran with no problem, but now it doesn't work. Querying any node results in "no leader elected".
So I SSH'd into the instance. consul info
results in Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: dial tcp 127.0.0.1:8500: getsockopt: connection refused
.
Next I tried:
$ ps -ef | grep consul
consul 2760 1 0 Nov28 ? 00:01:38 /usr/local/bin/consul agent -server -config-file=/etc/consul.conf -data-dir=/tmp/consul -node=1.1.1.1_i-042b3e8f28c622a -bind=2.2.2.2 -config-dir=/etc/consul.d
(I've hidden the IP and instance IDs here)
Looking at the log I see:
==> WARNING: Expect Mode enabled, expecting 3 servers
==> Starting Consul agent...
==> Consul agent running!
Version: 'v0.8.3'
Node ID: '6e0b3c-ad49-90d7-c8e2-121144a4ba'
Node name: '1.1.1.1_i-029b3e8f28622a'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 2.2.2.2 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2017/11/28 13:19:36 [INFO] raft: Initial configuration (index=0): []
2017/11/28 13:19:36 [INFO] serf: EventMemberJoin: 1.1.1.1_i-029b3e8f28c46622a 2.2.2.2
2017/11/28 13:19:36 [INFO] serf: EventMemberJoin: 1.1.1.1_i-029b3e8f28c46622a.dc1 2.2.2.2
2017/11/28 13:19:36 [INFO] raft: Node at 2.2.2.2:8300 [Follower] entering Follower state (Leader: "")
2017/11/28 13:19:36 [INFO] consul: Adding LAN server 1.1.1.1_i-029b3e8f28c46622a (Addr: tcp/2.2.2.2:8300) (DC: dc1)
2017/11/28 13:19:36 [INFO] consul: Handled member-join event for server "1.1.1.1_i-029b3e8f28c22a.dc1" in area "wan"
2017/11/28 13:19:36 [INFO] agent: Joining cluster...
2017/11/28 13:19:36 [INFO] agent: No EC2 region provided, querying instance metadata endpoint...
2017/11/28 13:19:36 [INFO] agent: Discovered 0 servers from EC2
2017/11/28 13:19:36 [WARN] agent: Join failed: No servers to join, retrying in 30s
2017/11/28 13:19:43 [ERR] agent: failed to sync remote state: No cluster leader
Any ideas on how to troubleshoot this?
You should bootstrap the cluster to allow initial leader election, easiest way is to use -bootstrap-expect
with the number of servers in the cluster (use the same flag and value for all servers).
More info about bootstrapping a cluster - https://www.consul.io/docs/guides/bootstrapping.html
and https://www.consul.io/docs/agent/options.html#_bootstrap
in your case it says "WARNING: Expect Mode enabled, expecting 3 servers" so it expects 3 servers before bootstraping the cluster. I see that you use only two? join another one and it should work... (less than 3 isn't recommended for consensus systems).