Search code examples
dockerdocker-swarmdocker-networkdocker-stack

Docker swarm DNS only returns services on local mode


I want to automatically scrape data from all instantiated services in my docker with Prometheus. I do this on a cluster with two workers and about 7 services. The services I want to scrape are deployed globally.

I've set Prometheus up to scrape using dns_sd_config and the target of tasks.cadvisor. This will result in a single host being returned, while it should be two services.

> tasks.cadvisor
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   tasks.cadvisor
Address: 10.0.1.9

In this example I can only find a single CAdvisor node, while there are actually two.

However, when I do a lookup for a service that runs twice on the same worker node, the lookup manages to find both of the services

> tasks.nginx
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   tasks.nginx
Address: 10.0.1.25
Name:   tasks.nginx
Address: 10.0.1.20

It seems like Docker DNS cannot do a lookup beyond it's own worker node. How can I set Docker up in a way that the DNS lookup returns all service instances across all workers?

Here's my current docker setup:

version: '3'
services:
  db:
    image: postgres
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    volumes:
      - db-data:/var/lib/postgresql/data
  backend:
    build: reggie-server
    image: requinard2/reggie-server
    command: python manage.py runserver 0.0.0.0:8000
    deploy:
      mode: global
    environment:
      - PRODUCTION=1
    depends_on:
      - db
  nginx:
    build: reggie-nginx
    image: requinard2/reggie-nginx
    deploy:
      mode: global
    ports:
      - "80:80"
      - "443:443"
    depends_on:
      - "backend"
      - "prometheus"
      - "grafana"
  prometheus:
    build: reggie-prometheus
    image: requinard2/reggie-prometheus
    ports:
      - "9090:9090"
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    volumes:
      - prometheus-data:/prometheus
    depends_on:
      - backend
      - cadvisor
  grafana:
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    image: grafana/grafana:5.1.0
    environment:
      GF_SERVER_ROOT_URL=/grafana:
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - "prometheus"
  cadvisor:
    image: google/cadvisor:latest
    deploy:
      mode: global
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    depends_on:
      - redis
  redis:
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    image: redis:latest
volumes:
  backend-code:
  db-data:
  grafana-data:
  prometheus-data:

Solution

  • After fiddling around with it the thought occurred to my to try and run this specific problem in a different environment than the cloud I had been using. I used docker-machine to create two local instances and it worked instantly. I started digging around a bit and it turns out my firewall wasn't properly configured. This was making my nodes unable to communicate with eachother.

    So I opened the following ports, as described here:

    • 2377/tcp
    • 7946/tcp&udp
    • 4789/udp

    This completely solved the problem and my nodes can now properly talk to eachother!