Search code examples
dockermariadbgalerarancher

Galera Cluster Rancher Node Scaling Issue


I have a very specific issue. I am using Rancher to manage my docker containers and am using the Galera Cluster Community Template. I added a second host to my setup and if I am scaling the clusting now - it keeps saying

level=fatal msg="invalid character '<' looking for beginning of value"

I can not really trace the error, nor do I have an idea where to look.

My research so far - it might be connected to the host communication protocolls (http vs https) since other people have had this error.

My question - how can I trace / debug and fix this error?

Additional information:
Docker Version on both hosts: 1.12.5
Rancher Version: v1.1.4

If you need anything else - I will be happy to provide more information.


Solution

  • It took me a while to figure this one out. It actually is not a problem with the template or Galera Cluster itself. The issue is within the way rancher / docker is fetching the IP within the evironment. The cause is that ubuntu ships by default using dns server as local address 127.0.0.1 and is by design. Problem is docker containers can't lookup in 127.0.0.1.

    See your /etc/resolv.conf file

    Perform a test:

    docker run -it ubuntu bash
    apt update
    apt install dnsutils
    # This will not respond
    dig @127.0.0.1 your.hostname.com
    

    Note: ping WILL work fine, and could trick you in thinking that name resolution is working. Dig is a proper way of doing that.

    You can also use rancher cli to get a hint of the problem:

    mkdir -p support
    rancher hosts -a > support/hosts
    rancher logs --tail=-1 ipsec/ipsec > support/ipsec 2>&1
    rancher logs --tail=-1 network-services/metadata > support/metadata 2>&1
    rancher logs --tail=-1 network-services/network-manager > support/network-manager 2>&1
    

    Solution:

    There are two solutions:

    1 - Configure ubuntu to use other nameserver like google public dns (8.8.8.8, 8.8.4.4). I try this one, and is by far too complicated for a simple change, as a said, ubuntu use that by design.

    2 - Change docker dns server. This worked fine for me. You will edit or create the file /etc/docker/daemon.json and put the line:

    {
      "dns": ["8.8.8.8", "8.8.4.4"]
    }
    

    Stop the containers and Restart the daemon:

    docker stop $(docker ps -q)
    docker stop $(docker ps -q) # yes twice :-) rancher will try do restart your dying containers
    systemctl restart docker
    

    Big thanks to Giovanni