Search code examples
dockerprometheusgrafanadocker-swarmtraefik

Why do you need "traefik.docker.network" for some service?


I have a docker swarm setup with traefik in it. I added a service (in this case, grafana and prometheus) with this label:

  grafana:
    ...
    labels:
      - "router=inbound"
      - "traefik.http.routers.grafana.entrypoints=http"
      - "traefik.http.routers.grafana.rule=Host(`grafana.localhost`)"
      - "traefik.http.routers.grafana.service=grafana"
      - "traefik.http.services.grafana.loadbalancer.server.port=3000"
      - "traefik.docker.network=inbound"
  networks:
      - inbound

  traefik:
    ...
    command:
      - --providers.docker.constraints=Label(`router`,`inbound`)
      - --providers.docker
      - --providers.docker.exposedbydefault=false
      - --entryPoints.http.address=:80
    networks:
      - inbound

Without "traefik.docker.network=inbound" in grafana labels, I can reach the service and access the UI, but it's incredibly unstable and I often got connection error. With that line, everything works smoothly.

I wonder what exactly that line does. Why is the service having connection issues (but not totally unreachable) without that line, when the container itself is already configured to be inside the same network as traefik router.


Solution

  • How traefik selects IP address of a service

    Your example declares for the service grafana labels:

          - "traefik.http.routers.grafana.service=grafana"
          - "traefik.http.services.grafana.loadbalancer.server.port=3000"
    

    These labels instruct traefik to forward HTTP requests to port 3000 of the grafana service.

    As docker allows each service to be connected to more than one network, the service (e.g. grafana) may have multiple network interfaces with different IP addresses.

    Your HTTP service might bind to all network interfaces and others services might bind only to subset of them. The later ones will not serve HTTP service on some interfaces.

    When traefik does not find the label:

          - "traefik.docker.network=inbound"
    

    it asks swarm for an IP address of the grafana service and picks whatever comes first regardless of the network. If the address picked is not served by you service, you get timeouts - thus the random connection issues you have experienced.

    On the other hand - when the label traefik.docker.network is present, traefik is more picky and takes only the IP address of the service bound to the given network. Assuming grafana being always bound to that network interface you get stable responses.

    Cure for services such as grafana, virtuoso and others

    Many services work perfectly without that label, e.g. whoami, nginx, python web applications using waitress or uvicorn worked to us without any issue.

    There are other services such as grafana or virtuoso, which might become unstable or totally unusable in docker swarm with traefik if not used with the traefik.docker.network label.

    Me and my colleague Milan Holub have spent around 18 hours trying to resolve mysterious issue for virtuoso (responding only the first request, changing it's IP address in the traefik dashbord...) and the traefik.docker.network label was the final solution.