Currently dealing with an CoreDNS-related issue which occurs on a fresh Kubernetes setup on a Raspberry PI.
Issue: CoreDNS forwards ALL DNS queries to the local gateway/router, which has no clue how to resolve any in-cluster service names, regardless of specificity.
How I diagnosed the issue:
Performing any nslookup
queries results in a NXDOMAIN
response, which means non-existent domain. This response always comes from the local router.
NOTE: in the following outputs 10.32.0.2
is the IP of one of the CoreDNS pods, soc.local
is the domain name of the cluster, wpad.fritz.box
is the hostname of the local router.
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN
/ # nslookup kubernetes.default 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
/ # nslookup kubernetes.default.soc 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc: NXDOMAIN
** server can't find kubernetes.default.soc: NXDOMAIN
/ # nslookup kubernetes.default.soc.local 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc.local: NXDOMAIN
** server can't find kubernetes.default.soc.local: NXDOMAIN
The following is the output of tcpdump and the network traffic associated with the nslookup
query for kubernetes
:
/ # tcpdump -i weave host 10.32.0.2 and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
16:57:48.047794 IP 10.32.0.5.54782 > 10.32.0.2.53: 42507+ A? kubernetes. (28)
16:57:48.048136 IP 10.32.0.5.54782 > 10.32.0.2.53: 43025+ AAAA? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.35867 > wpad.fritz.box.53: 42507+ A? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.37755 > wpad.fritz.box.53: 43025+ AAAA? kubernetes. (28)
16:57:48.050611 IP wpad.fritz.box.53 > 10.32.0.2.35867: 42507 NXDomain 0/1/0 (103)
16:57:48.050916 IP wpad.fritz.box.53 > 10.32.0.2.37755: 43025 NXDomain 0/1/0 (103)
16:57:48.051109 IP 10.32.0.2.53 > 10.32.0.5.54782: 42507 NXDomain 0/1/0 (103)
16:57:48.051503 IP 10.32.0.2.53 > 10.32.0.5.54782: 43025 NXDomain 0/1/0 (103)
The following are the CoreDNS logs corresponding to the nslookup
queries:
[INFO] 10.32.0.5:53591 - 23327 "AAAA IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000318349s
[INFO] 10.32.0.5:53591 - 22735 "A IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000447718s
[INFO] 10.32.0.5:58545 - 49038 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.0314311s
[INFO] 10.32.0.5:58545 - 48445 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.033794968s
[INFO] 10.32.0.5:53665 - 62210 "A IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.047918913s
[INFO] 10.32.0.5:53665 - 62802 "AAAA IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.067865341s
[INFO] 10.32.0.5:56021 - 47416 "A IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000430478s
[INFO] 10.32.0.5:56021 - 48046 "AAAA IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000551032s
The following is the configmap of the CoreDNS Corefile:
$ k get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes soc.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-07-07T20:58:06Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data: {}
manager: kubeadm
operation: Update
time: "2020-07-07T20:58:06Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
f:Corefile: {}
manager: kubectl
operation: Update
time: "2020-07-28T17:21:46Z"
name: coredns
namespace: kube-system
resourceVersion: "2464367"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c6a603c3-30b6-4156-b62e-a98d53761541
My question is: Why is CoreDNS not handling these DNS queries for in-cluster service names?
Not sure what else to debug. It's also a shame that the CoreDNS images don't have a shell so I can take a look at the /etc/resolv.conf
file.
Any suggestions?
Shortly after posting the question I re-read the Kubernetes documentation regarding debugging DNS name resolution and in the very last paragraph of the Known Issues section there is a mention of some Alpine versions which have DNS problems. While the github ticket linked doesn't explicitly describe my problem in the same way, it does seem that the Alpine version is the issue:
$ kubectl run -ti --rm alpine --image=alpine:3.9.6 --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address 1: 10.32.0.2 10-32-0-2.kube-dns.kube-system.svc.soc.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.soc.local
/ # pod "alpine" deleted
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN