Search code examples
kubernetesnetwork-programmingk3scorednsk3sup

Connection refused from pod to pod via service clusterIP


Something wrong happend with my RPi 4 cluster based on k3sup.

Everything works as expected until yesterday when I had to reinstall master node operating system. For example, I have a redis installed on master node and then some pods on worker nodes. My pods can not connect to redis via DNS: redis-master.database.svc.cluster.local (but they do day before).

It throws an error that can not resolve domain when I test with busybox like:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup redis-master.database.svc.cluster.local

When I want to ping my service with IP (also on busybox):

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- ping 10.43.115.159

It shows that 100% packet loss.

I'm able to resolve issue with DNS by simply replace coredns config (replace line with forward . /etc/resolv.conf to forward . 192.168.1.101) but I don't think that's good solution, as earlier I didn't have to do that.

Also, it solves issue for mapping domain to IP, but still connection via IP doesn't work.

My nodes:

NAME     STATUS   ROLES    AGE   VERSION         INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION   CONTAINER-RUNTIME
node-4   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.105   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-3   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.104   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-1   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.102   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-0   Ready    master   10h   v1.19.15+k3s2   192.168.1.101   <none>        Debian GNU/Linux 10 (buster)   5.10.63-v8+      containerd://1.4.11-k3s1
node-2   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.103   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1

Master node has a taint: role=master:NoSchedule.

Any ideas?

UPDATE 1

I'm able to connect into redis pod. /etc/resolv.conf from redis-master-0

search database.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5

All services on kubernetes:

NAMESPACE       NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                               PORT(S)                      AGE
default         kubernetes              ClusterIP      10.43.0.1       <none>                                                    443/TCP                      6d9h
kube-system     traefik-prometheus      ClusterIP      10.43.94.137    <none>                                                    9100/TCP                     6d8h
registry        proxy-docker-registry   ClusterIP      10.43.16.139    <none>                                                    5000/TCP                     6d8h
kube-system     kube-dns                ClusterIP      10.43.0.10      <none>                                                    53/UDP,53/TCP,9153/TCP       6d9h
kube-system     metrics-server          ClusterIP      10.43.101.30    <none>                                                    443/TCP                      6d9h
database        redis-headless          ClusterIP      None            <none>                                                    6379/TCP                     5d19h
database        redis-master            ClusterIP      10.43.115.159   <none>                                                    6379/TCP                     5d19h
kube-system     traefik                 LoadBalancer   10.43.221.89    192.168.1.102,192.168.1.103,192.168.1.104,192.168.1.105   80:30446/TCP,443:32443/TCP   6d8h

Solution

  • There was one more thing that was not mentioned. I'm using OpenVPN with NordVPN server list on master node, and use a privoxy for worker nodes.

    When you install and run OpenVPN before running kubernetes master, OpenVPN add rules that block kubernetes networking. So, coredns does not work and you can't reach any pod via IP as well.

    I'm using RPi 4 cluster, so for me it was good enough to just re-install master node, install kubernetes at first and then configure openvpn. Now everything is working as expected.

    It's good enough to order your system units by adding After or Before in service definition. I have VPN systemd service that looks like below:

    [Unit]
    Description=Enable VPN for System
    After=network.target
    After=k3s.service
    
    [Service]
    Type=simple
    ExecStart=/etc/openvpn/start-nordvpn-server.sh
    
    [Install]
    WantedBy=multi-user.target
    

    It guarantee that VPN will be run after kubernetes.