Search code examples
dockerdocker-composemqtt-vernemq

Clustering vernemq docker containers running on different machines


I was hoping this would be an easy one by just using the below snippet on the second instance's docker-compose.yml file

- DOCKER_VERNEMQ_DISCOVERY_NODE=<ip address of the first instance> 

but that doesn't seem to work.

Log of the second instance confirms it's attempting to cluster:

13:56:09.795 [info] Sent join request to: 'VerneMQ@<ip address of the first instance>'
13:56:16.800 [info] Unable to connect to 'VerneMQ@<ip address of the first instance>'

While the log of the first instance does not show anything at all.

From within the second instance I can confirm that the endpoint is accessible:

$ docker exec -it vernemq /bin/sh
$ curl <ip address of the first instance>:44053
curl: (56) Recv failure: Connection reset by peer

then in the log of the first instance I see an error which is totally expected and confirms I've reached the first instance

13:58:33.572 [error] CRASH REPORT Process <0.3050.0> with 0 neighbours crashed with reason: bad argument in vmq_cluster_com:process_bytes/3 line 142
13:58:33.572 [error] Ranch listener {{172,19,0,2},44053} terminated with reason: bad argument in vmq_cluster_com:process_bytes/3 line 142

It might have to do with the fact that ip address as seen from within the docker container is 172.19.0.2 while the external one is 10. ....

Also tried adding hostname of the first instance to known_hosts to no avail.

Please advise.

I'm using erlio/docker-vernemq:1.10.0

$ docker --version
Docker version 19.03.13, build 4484c46d9d

$ docker-compose --version
docker-compose version 1.27.2, build 18f557f9

Solution

  • I managed to get this sorted by creating a docker overlay network

    on machine1: docker swarm init
    on machine2: docker swarm join --token ...
    on machine1: docker network create --driver=overlay --attachable vernemq-overlay-net

    The relevant bits of my dockerfile are:

    version: '3.6'
    
    services:
      vernemq:
        container_name: ${NODE_NAME:?Node name not specified}
        image: vernemq/vernemq:1.10.4.1
        environment:
          - DOCKER_VERNEMQ_NODENAME=${NODE_NAME:?Node name not specified}
          - DOCKER_VERNEMQ_DISCOVERY_NODE=${DISCOVERY_NODE:-}
    
    networks:
      default:
        external:
          name: vernemq-overlay-net
    

    with the following env vars:

    machine1:

    • NODE_NAME=vernemq1.example.com
    • DISCOVERY_NODE=

    machine2:

    • NODE_NAME=vernemq2.example.com
    • DISCOVERY_NODE=vernemq1.example.com

    Note:
    Chances are machine2 won't find vernemq-overlay-net due to a bug in docker-compose as far as I remember.
    In that case you start a container with docker: docker run -dit --name alpine --net=vernemq-overlay-net alpine which will make it available for docker-compose.