Search code examples
dockerssldocker-composelets-encrypttraefik

Docker Traefik can't resolve DNS (Fails reaching server and obtaining certificates)


Googling the following issue shows that this hasn't been posted the first time, however, none of them really give an answer.

When starting Traefik (v2.2.1 aka. latest) as a container in Docker, no matter what I try, I keep getting following error, for ALL the domains configured:

time="2020-05-24T15:48:57Z" level=error msg="Unable to obtain ACME certificate for domains \"<my domain>\": cannot get ACME client get directory at 'https://acme-staging-v02.api.letsencrypt.org/directory': Get \"https://acme-staging-v02.api.letsencrypt.org/directory\": dial tcp: lookup acme-staging-v02.api.letsencrypt.org on 127.0.0.11:53: read udp 127.0.0.1:44687->127.0.0.11:53: i/o timeout" routerName=traefik@docker rule="Host(`<my domain>`)" providerName=le.acme

Checking https://letsencrypt.status.io/ it doesn't seem to be a problem of Let's Encrypt's servers

enter image description here enter image description here


I have tried with two different OSs on the server Debian 10, Ubuntu Server 18.04 and 20.04. While installing the OS, I always follow my guide I created for myself here: https://gist.github.com/D3strukt0r/5aaba1a021d16b31fa19adf6eb26a102

Yes, I do as little as possible in the system and as much as possible with the containers.


Following is my docker-compose.yml for Traefik

version: "2"

# Manage domain access to services
services:
  traefik:
    container_name: traefik
    image: traefik
    command:
      - --log.level=DEBUG
      - --api.dashboard=true
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --providers.docker.network=traefik_proxy
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
      - --certificatesresolvers.le.acme.email=${ACME_EMAIL}
      - --certificatesresolvers.le.acme.storage=acme.json
      - --certificatesresolvers.le.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
      - --certificatesresolvers.le.acme.dnschallenge=true
      - --certificatesresolvers.le.acme.dnschallenge.provider=cloudflare
      # - --certificatesresolvers.le.acme.dnschallenge.resolvers=1.1.1.1:53,8.8.8.8:53
    restart: always
    networks:
      - traefik_proxy
    ports:
      - 80:80
      - 443:443
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      #- ./acme.json:/acme.json
      - ./acme_testing.json:/acme.json
    environment:
      CF_API_EMAIL: ${CF_API_EMAIL}
      CF_API_KEY: ${CF_API_KEY}
    labels:
      - traefik.enable=true

      - traefik.http.routers.traefik0.entrypoints=http
      - traefik.http.routers.traefik0.rule=Host(`<my domain>`)
      - traefik.http.routers.traefik0.middlewares=to_https

      - traefik.http.routers.traefik.entrypoints=https
      - traefik.http.routers.traefik.rule=Host(`<my domain>`)
      - traefik.http.routers.traefik.middlewares=traefik_auth
      - traefik.http.routers.traefik.tls=true
      - traefik.http.routers.traefik.tls.certresolver=le
      - traefik.http.routers.traefik.service=api@internal

      # Declaring the user list
      #
      # Note: all dollar signs in the hash need to be doubled for escaping.
      # To create user:password pair, it's possible to use this command:
      # echo $(htpasswd -nb user password) | sed -e s/\\$/\\$\\$/g
      - traefik.http.middlewares.traefik_auth.basicauth.users=${TRAEFIK_USERS}

      # Standard middleware for other containers to use
      - traefik.http.middlewares.to_https.redirectscheme.scheme=https
      - traefik.http.middlewares.to_https_perm.redirectscheme.scheme=https
      - traefik.http.middlewares.to_https_perm.redirectscheme.permanent=true

networks:
  traefik_proxy:
    external: true

The folder structure in there:

root@server:/opt/traefik# ls -Al
total 8
-rw------- 1 root root      0 May 24 00:37 acme.json
-rw------- 1 root root      0 May 24 00:37 acme_testing.json
-rw-rw-r-- 1 root docker 2406 May 24 18:04 docker-compose.yml
-rw-rw-r-- 1 root docker  185 May 23 23:49 .env

That's all there is to the configuration.

An nslookup outside will give the following:

root@server:/opt/traefik# nslookup acme-staging-v02.api.letsencrypt.org
Server:         192.168.1.1
Address:        192.168.1.1#53

Non-authoritative answer:
acme-staging-v02.api.letsencrypt.org    canonical name = staging.api.letsencrypt.org.
staging.api.letsencrypt.org     canonical name = 56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com.
Name:   56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Address: 172.65.46.172
Name:   56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Address: 2606:4700:60:0:f41b:d4fe:4325:6026

An nslookup INSIDE the container will give the following:

manuele@server:/opt$ docker exec -it traefik /bin/sh
/ # nslookup acme-staging-v02.api.letsencrypt.org
;; connection timed out; no servers could be reached

Maybe for further information, here is also the log

root@server:/opt/traefik# docker-compose up
Recreating traefik ... done
Attaching to traefik
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Configuration loaded from flags."
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Traefik version 2.2.1 built on 2020-04-29T18:02:09Z"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true},\"serversTransport\":{\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"http\":{\"address\":\":80\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}},\"https\":{\"address\":\":443\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}}},\"providers\":{\"providersThrottleDuration\":2000000000,\"docker\":{\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"network\":\"traefik_proxy\",\"swarmModeRefreshSeconds\":15000000000}},\"api\":{\"dashboard\":true},\"log\":{\"level\":\"DEBUG\",\"format\":\"common\"},\"certificatesResolvers\":{\"le\":{\"acme\":{\"email\":\"<ACME Email>\",\"caServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"cloudflare\"}}}}}"
traefik    | time="2020-05-24T16:05:34Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/contributing/data-collection/\n"
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Start TCP Server" entryPointName=https
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Starting provider *acme.Provider {\"email\":\"<ACME Email>\",\"caServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"cloudflare\"},\"ResolverName\":\"le\",\"store\":{},\"ChallengeStore\":{}}"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Start TCP Server" entryPointName=http
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Testing certificate renew..." providerName=le.acme
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Starting provider *docker.Provider {\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"network\":\"traefik_proxy\",\"swarmModeRefreshSeconds\":15000000000}"
traefik    | time="2020-05-24T16:05:34Z" level=info msg="Starting provider *traefik.Provider {}"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Configuration received from provider le.acme: {\"http\":{},\"tls\":{}}" providerName=le.acme
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Configuration received from provider internal: {\"http\":{\"services\":{\"api\":{},\"dashboard\":{},\"noop\":{}}},\"tcp\":{},\"tls\":{}}" providerName=internal
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="No default certificate, generating one"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Provider connection established with docker 19.03.9 (API 1.40)" providerName=docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Configuration received from provider docker: {\"http\":{\"routers\":{\"traefik\":{\"entryPoints\":[\"https\"],\"middlewares\":[\"traefik_auth\"],\"service\":\"api@internal\",\"rule\":\"Host(`<my domain>`)\",\"tls\":{\"certResolver\":\"le\"}},\"traefik0\":{\"entryPoints\":[\"http\"],\"middlewares\":[\"to_https\"],\"service\":\"traefik-traefik\",\"rule\":\"Host(`<my domain>`)\"}},\"services\":{\"traefik-traefik\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://172.18.0.2:80\"}],\"passHostHeader\":true}}},\"middlewares\":{\"to_https\":{\"redirectScheme\":{\"scheme\":\"https\"}},\"to_https_perm\":{\"redirectScheme\":{\"scheme\":\"https\",\"permanent\":true}},\"traefik_auth\":{\"basicAuth\":{\"users\":[\"<traefik users>\"]}}}},\"tcp\":{},\"udp\":{}}" providerName=docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="No default certificate, generating one"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating middleware" serviceName=traefik-traefik entryPointName=http routerName=traefik0@docker middlewareName=pipelining middlewareType=Pipelining
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating load-balancer" entryPointName=http routerName=traefik0@docker serviceName=traefik-traefik
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating server 0 http://172.18.0.2:80" routerName=traefik0@docker serviceName=traefik-traefik serverName=0 entryPointName=http
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Added outgoing tracing middleware traefik-traefik" middlewareType=TracingForwarder routerName=traefik0@docker entryPointName=http middlewareName=tracing
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating middleware" entryPointName=http routerName=traefik0@docker middlewareName=to_https@docker middlewareType=RedirectScheme
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Setting up redirection to https " entryPointName=http routerName=traefik0@docker middlewareName=to_https@docker middlewareType=RedirectScheme
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Adding tracing to middleware" entryPointName=http routerName=traefik0@docker middlewareName=to_https@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating middleware" middlewareType=Recovery entryPointName=http middlewareName=traefik-internal-recovery
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Added outgoing tracing middleware api@internal" middlewareName=tracing middlewareType=TracingForwarder entryPointName=https routerName=traefik@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating middleware" middlewareType=BasicAuth routerName=traefik@docker entryPointName=https middlewareName=traefik_auth@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Adding tracing to middleware" routerName=traefik@docker middlewareName=traefik_auth@docker entryPointName=https
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Creating middleware" entryPointName=https middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="No default certificate, generating one"
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Try to challenge certificate for domain [<my domain>] found in HostSNI rule" providerName=le.acme rule="Host(`<my domain>`)" routerName=traefik@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Looking for provided certificate(s) to validate [\"<my domain>\"]..." providerName=le.acme rule="Host(`<my domain>`)" routerName=traefik@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Domains [\"<my domain>\"] need ACME certificates generation for domains \"<my domain>\"." providerName=le.acme rule="Host(`<my domain>`)" routerName=traefik@docker
traefik    | time="2020-05-24T16:05:34Z" level=debug msg="Loading ACME certificates [<my domain>]..." providerName=le.acme rule="Host(`<my domain>`)" routerName=traefik@docker
traefik    | time="2020-05-24T16:05:35Z" level=debug msg="Building ACME client..." providerName=le.acme
traefik    | time="2020-05-24T16:05:35Z" level=debug msg="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=le.acme
traefik    | time="2020-05-24T16:05:55Z" level=error msg="Unable to obtain ACME certificate for domains \"<my domain>\": cannot get ACME client get directory at 'https://acme-staging-v02.api.letsencrypt.org/directory': Get \"https://acme-staging-v02.api.letsencrypt.org/directory\": dial tcp: lookup acme-staging-v02.api.letsencrypt.org on 127.0.0.11:53: read udp 127.0.0.1:49272->127.0.0.11:53: i/o timeout" routerName=traefik@docker providerName=le.acme rule="Host(`<my domain>`)"

Another option now is to use docker run ... instead, so let's try with:

docker run -it \
    -v /var/run/docker.sock:/var/run/docker.sock:ro \
    -v /opt/traefik/acme_testing.json:/acme.json \
    -e CF_API_EMAIL="<Cloudflare Email>" \
    -e CF_API_KEY="<Cloudflare API>" \
    -p 80:80 \
    -p 443:443 \
    --network traefik_proxy \
    --name traefik \
    traefik \
    --log.level=DEBUG \
    --api.dashboard=true \
    --providers.docker=true \
    --providers.docker.exposedbydefault=false \
    --providers.docker.network=traefik_proxy \
    --entrypoints.http.address=:80 \
    --entrypoints.https.address=:443 \
    --certificatesresolvers.le.acme.email="<ACME Email>" \
    --certificatesresolvers.le.acme.storage=acme.json \
    --certificatesresolvers.le.acme.caserver="https://acme-staging-v02.api.letsencrypt.org/directory" \
    --certificatesresolvers.le.acme.dnschallenge=true \
    --certificatesresolvers.le.acme.dnschallenge.provider=cloudflare

Which gives:

root@server:/opt/traefik# docker exec -it traefik /bin/sh
/ # nslookup acme-staging-v02.api.letsencrypt.org
;; connection timed out; no servers could be reached

Alright try again without networks:

docker run -it \
    -v /var/run/docker.sock:/var/run/docker.sock:ro \
    -v /opt/traefik/acme_testing.json:/acme.json \
    -e CF_API_EMAIL="<Cloudflare Email>" \
    -e CF_API_KEY="<Cloudflare API>" \
    -p 80:80 \
    -p 443:443 \
    --name traefik \
    traefik \
    --log.level=DEBUG \
    --api.dashboard=true \
    --providers.docker=true \
    --providers.docker.exposedbydefault=false \
    --providers.docker.network=traefik_proxy \
    --entrypoints.http.address=:80 \
    --entrypoints.https.address=:443 \
    --certificatesresolvers.le.acme.email="<ACME Email>" \
    --certificatesresolvers.le.acme.storage=acme.json \
    --certificatesresolvers.le.acme.caserver="https://acme-staging-v02.api.letsencrypt.org/directory" \
    --certificatesresolvers.le.acme.dnschallenge=true \
    --certificatesresolvers.le.acme.dnschallenge.provider=cloudflare

Which leads to:

root@server:/opt/traefik# docker exec -it traefik /bin/sh
/ # nslookup acme-staging-v02.api.letsencrypt.org
nslookup: write to '192.168.1.233': Connection refused
Server:         192.168.1.1
Address:        192.168.1.1:53

Non-authoritative answer:
acme-staging-v02.api.letsencrypt.org    canonical name = staging.api.letsencrypt.org
staging.api.letsencrypt.org     canonical name = 56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Name:   56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Address: 172.65.46.172

Non-authoritative answer:
acme-staging-v02.api.letsencrypt.org    canonical name = staging.api.letsencrypt.org
staging.api.letsencrypt.org     canonical name = 56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Name:   56a5f4b0bc8146689ec3e272c43525f9.pacloudflare.com
Address: 2606:4700:60:0:f41b:d4fe:4325:6026

Through all of this, the acme files stayed empty, so the problem still persists.

root@server:/opt/traefik# ls -Al
total 12
-rw------- 1 root    root       0 May 24 00:37 acme.json
-rw------- 1 root    root       0 May 24 00:37 acme_testing.json
-rw-rw-r-- 1 root    docker  2406 May 24 18:04 docker-compose.yml
-rw-rw-r-- 1 root    docker   185 May 23 23:49 .env

If someone can help to fix this, thank you very much in advance.

If you need more information than even all of the stuff I added, feel free to tell me, so I'll provide it.


Solution

  • So, after hours of tinkering, I found out, that this is a problem that exists somehow across the docker-compose universe. The fix for this is actually pretty simple.

    Add the following in each container that needs to talk to the outside world:

    version: "2"
    
    services:
      <the service>:
        ...
        dns:
          - 1.1.1.1
          - 1.0.0.1
        ...
    

    This will tell the DNS resolver inside the container (which is under 127.0.0.11) to use these domains, instead of whatever is preventing it from talking to the outside world.