Search code examples
dockerdocker-swarmconsul

Docker Swarm with Consul - Manager not electing primary


I'm trying to setup a HA docker cluster on 3 dedicated pc's. I've successfully followed the instructions on docs.docker.com/engine/installation/linux/ubuntulinux and now I'm trying to follow the instructions on https://docs.docker.com/swarm/install-manual. Since I'm not using any virtualization I start at "Set up an consul discovery backend". The PC's (running ubuntu trusty 14.04 server edition) are all in the LAN 192.168.2.0/24. ubuntu001 has .104, ubuntu002 has .106, and ubuntu003 has .105

I did the following according to the instructions:

arnolde@ubuntu001:~$ docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap

arnolde@ubuntu001:~$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise 192.168.2.104:4000  consul://192.168.2.104

arnolde@ubuntu002:~# docker run -d swarm manage -H :4000 --replication --advertise 192.168.2.106:4000  consul://192.168.2.104:8500

arnolde@ubuntu003:~$ docker run -d swarm join --advertise=192.168.2.105:2375 consul://192.168.2.104:8500

But then when trying the next step, the swarm manager does NOT show up as "Primary" like it says it should, and no primary is listed:

arnolde@ubuntu001:~$ docker -H :4000 info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: swarm/1.1.0
Role: replica
Primary: 
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 0
Plugins: 
 Volume: 
 Network: 
Kernel Version: 3.19.0-25-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B

And: arnolde@ubuntu001:~$ docker -H :4000 run hello-world docker: Error response from daemon: No elected primary cluster manager.

I searched and found https://github.com/docker/swarm/issues/1491 which recommends to use dockerswarm/swarm:master instead, which I did, but it didn't help:

arnolde@ubuntu001:~$ docker run -d -p 4000:4000 dockerswarm/swarm:master manage -H :4000 --replication --advertise 192.168.2.104:4000  consul://192.168.2.104

I didn't find any other input regarding swarm+consul+primary so here I am... any suggestions? Unfortunately I'm not sure how to troubleshoot since I don't even know where to look for logging/debugging info, i.e. if the manager is connecting to consul successfully etc...


Solution

  • I was able to solve it myself after explicitly adding the port number to the consul:// parameter, apparently the docker docs are incomplete:

    arnolde@ubuntu001:~$ docker run -d -p 4000:4000 dockerswarm/swarm:master manage -H :4000 --replication --advertise 192.168.2.104:4000 consul://192.168.2.104:8500
    arnolde@ubuntu001:~$ docker -H :4000 info
    Containers: 0
     Running: 0
     Paused: 0
     Stopped: 0
    Images: 0
    Server Version: swarm/1.1.0
    Role: replica
    Primary: 192.168.2.106:4000
    

    Also I added "-p 4000:4000" to the command on the replica manager (on ubuntu002). Not sure if that was necessary (or even a good idea).