I have setup Docker Swarm with 3 nodes. They are all configured as part of the cluster. The master node is also a host for containers.
Each node has an NFS mount to NFS to store centralised data.
I have created an ElasticSearch cluster that is global, so it runs on all the nodes. This is configured to run in Docker Swarm using DNS Round Robin (dnsrr
) in the cluser. As dnsrr
does not allow ports to be exposed I have an Nginx proxy server that listens for requests on 19200 and then proxies them to the service elasticsearch
on port 9200.
The Traefik, ElasticSearch Proxy and ElasticSearch nodes are all attached to the same overlay network called elastic_cluster
. The network is 10.0.10.0/24
.
I have configured Traefik so that it has a named host and answers to elasticsearch.homenetwork.local
(not real domain) on port 443. (Let's Encrypt is configured) and should forward to the Nginx proxy.
However when I try and hit https://elasticsearch.homenetwork.local
I get an error in the Traefik logs:
time="2017-12-02T22:50:37Z" level=warning msg="Error forwarding to http://10.0.10.3:19200, err: dial tcp 10.0.10.3:19200: getsockopt: no route to host"
Given that the Traefik service is on the 10.0.10.0/24
network I do not understand why I get this error. I am using Portainer to track the services and I can see that the Traefik service has an IP address of 10.0.10.4
.
If I run an interactive session to the Nginx Proxy, which has an IP address of 10.0.10.7
, I am able to ping 10.0.10.4
without issue.
The containers are all running Ubuntu. There is no iptables
involved.
Has anyone seen something like this before? I am struggling to work out what is wrong here so if anyone has any suggestions I would be very grateful for them. What is really annoying is that this used to work. I do not recall changing anything but obviously something has.
ElasticSearch service command:
docker service create --name elasticsearch \
--network elastic_cluster \
--constraint "node.labels.app_role == "elasticsearch" \
--mode global \
--endpoint-mode dnsrr \
docker.elastic.co/elasticsearch/elasticsearch:5.4.2 \
elasticsearch
ElasticSearch Proxy service command:
docker service create --name elasticsearch_proxy \
--network elastic_cluster \
--label traefik.enable=true \
--label traefik.backend=elasticsearch_proxy \
--label traefik.port=19200 \
--label traefik.frontend.rule=Host:elasticsearch.home.turtlesystems.co.uk \
--label traefik.docker.network=elastic_cluster \
nginx:1.13
nginx.conf
- https://pastebin.com/Q5sXw6aw
Traefik service command
docker service create --name reverse_proxy \
--network elastic_cluster \
--network traefik-net \
--constraint "node.role == manager" \
--publish 80:80 \
--publish 8080:8080 \
--publish 443:443 \
traefik
traefik.toml
- https://pastebin.com/GFPu8MYJ
After all of this it turned out that the reason I was having network problems was because of a faulty network cable.
After replacing it everythig worked ok. Thank you for all the suggestions.