CoreDNS is forwarding ALL DNS queries to local router, including those for in-cluster service names

Currently dealing with an CoreDNS-related issue which occurs on a fresh Kubernetes setup on a Raspberry PI.

Issue: CoreDNS forwards ALL DNS queries to the local gateway/router, which has no clue how to resolve any in-cluster service names, regardless of specificity.

How I diagnosed the issue:

Performing any nslookup queries results in a NXDOMAIN response, which means non-existent domain. This response always comes from the local router.

NOTE: in the following outputs 10.32.0.2 is the IP of one of the CoreDNS pods, soc.local is the domain name of the cluster, wpad.fritz.box is the hostname of the local router.

$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash

/ # nslookup kubernetes 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes: NXDOMAIN

** server can't find kubernetes: NXDOMAIN

/ # nslookup kubernetes.default 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

/ # nslookup kubernetes.default.soc 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default.soc: NXDOMAIN

** server can't find kubernetes.default.soc: NXDOMAIN

/ # nslookup kubernetes.default.soc.local 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes.default.soc.local: NXDOMAIN

** server can't find kubernetes.default.soc.local: NXDOMAIN

The following is the output of tcpdump and the network traffic associated with the nslookup query for kubernetes:

/ # tcpdump -i weave host 10.32.0.2 and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes

16:57:48.047794 IP 10.32.0.5.54782 > 10.32.0.2.53: 42507+ A? kubernetes. (28)
16:57:48.048136 IP 10.32.0.5.54782 > 10.32.0.2.53: 43025+ AAAA? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.35867 > wpad.fritz.box.53: 42507+ A? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.37755 > wpad.fritz.box.53: 43025+ AAAA? kubernetes. (28)
16:57:48.050611 IP wpad.fritz.box.53 > 10.32.0.2.35867: 42507 NXDomain 0/1/0 (103)
16:57:48.050916 IP wpad.fritz.box.53 > 10.32.0.2.37755: 43025 NXDomain 0/1/0 (103)
16:57:48.051109 IP 10.32.0.2.53 > 10.32.0.5.54782: 42507 NXDomain 0/1/0 (103)
16:57:48.051503 IP 10.32.0.2.53 > 10.32.0.5.54782: 43025 NXDomain 0/1/0 (103)

The following are the CoreDNS logs corresponding to the nslookup queries:

[INFO] 10.32.0.5:53591 - 23327 "AAAA IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000318349s
[INFO] 10.32.0.5:53591 - 22735 "A IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000447718s
[INFO] 10.32.0.5:58545 - 49038 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.0314311s
[INFO] 10.32.0.5:58545 - 48445 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.033794968s
[INFO] 10.32.0.5:53665 - 62210 "A IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.047918913s
[INFO] 10.32.0.5:53665 - 62802 "AAAA IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.067865341s
[INFO] 10.32.0.5:56021 - 47416 "A IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000430478s
[INFO] 10.32.0.5:56021 - 48046 "AAAA IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000551032s

The following is the configmap of the CoreDNS Corefile:

$ k get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes soc.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2020-07-07T20:58:06Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data: {}
    manager: kubeadm
    operation: Update
    time: "2020-07-07T20:58:06Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        f:Corefile: {}
    manager: kubectl
    operation: Update
    time: "2020-07-28T17:21:46Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "2464367"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: c6a603c3-30b6-4156-b62e-a98d53761541

My question is: Why is CoreDNS not handling these DNS queries for in-cluster service names?

Not sure what else to debug. It's also a shame that the CoreDNS images don't have a shell so I can take a look at the /etc/resolv.conf file. Any suggestions?

Solution

Shortly after posting the question I re-read the Kubernetes documentation regarding debugging DNS name resolution and in the very last paragraph of the Known Issues section there is a mention of some Alpine versions which have DNS problems. While the github ticket linked doesn't explicitly describe my problem in the same way, it does seem that the Alpine version is the issue:

$ kubectl run -ti --rm alpine --image=alpine:3.9.6 --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server:    10.32.0.2
Address 1: 10.32.0.2 10-32-0-2.kube-dns.kube-system.svc.soc.local

Name:      kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.soc.local
/ # pod "alpine" deleted

$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server:     10.32.0.2
Address:    10.32.0.2:53

** server can't find kubernetes: NXDOMAIN

** server can't find kubernetes: NXDOMAIN