Search code examples
dockernetwork-programmingkubernetesrke

No outbound networking on Kubernetes pods


I am running a one-node Kubernetes cluster in a VM for development and testing purposes. I used Rancher Kubernetes Engine (RKE, Kubernetes version 1.18) to deploy it and MetalLB to enable the LoadBalancer service type. Traefik is version 2.2, deployed via the official Helm chart (https://github.com/containous/traefik-helm-chart). I have a few dummy containers deployed to test the setup (https://hub.docker.com/r/errm/cheese).

I can access the Traefik dashboard just fine through the nodes IP (-> MetalLB seems to work). It registers the services and routes for the test containers. Everything is looking fine but when I try to access the test containers in my browser I get a 502 Bad Gateway error.

Some probing showed that there seems to be an issue with outbound traffic from the pods. When I SSH into the node I can reach all pods by their service or pod IP. DNS from node to pod works as well. However, if I start an interactive busybox pod I can't reach any other pod or host from there. When I wget to any other container (all in the default namespace) I only get wget: can't connect to remote host (10.42.0.7): No route to host. The same is true for servers on the internet.

I have not installed any network policies and there are none installed by default that I am aware of.

I have also gone through this: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service

Everything in the guide is working fine, except that the pods don't seem to have any network connectivity whatsoever.

My RKE config is standard, except that I turned off the standard Nginx ingress and enabled etcd encryption-at-rest.

Any ideas?


Solution

  • Ok, so I was being stupid (or rather: a noob). I had an old iptables rule lying around on the host dropping all traffic on the FORWARD chain... removing that rule fixed the problem.

    I feel a bit uneasy just removing that role but I have to admit that I don't fully understand the security implications of this. This might take some further research, but that's another topic. And since I'm not currently planning to run this cluster in production but rather use a hosted cluster, it's not really a problem anyways.