Search code examples
kuberneteswindows-subsystem-for-linux

Trying to run Kubernetes on multiple nodes under WSL2 but how to deal with the unreachable ip addresses in WSL2?


Trying to run the control plane inside WSL2 and worker nodes under WSL2 on other machines. Because they're inside WSL2, they're going to only find their own IP addresses and think to use those, which will not work because other nodes cannot reach those private WSL2 ip addresses.

I tried setting up port forwarding with netsh interface portproxy... calls on the control host and configuring the control plane to use the host ip address, but it would still send configuration to the worker nodes to use the private ip. After finding a few config files that still had the WSL2 virtual IP and changing them to point to the host IP, I would then see a lot of errors in syslog: "failed to validate nodeIP: node IP: \"192.168.0.100\" not found in the host's network interfaces because inside WSL2, that ip address is unknown.

Is there a way to make this work?

Note, I'm looking for a solution in WSL2 where the networking works differently from WSL.


Solution

  • This turns out to be sort of two questions. For the kubernetes side of it, the answer is probably:

    1. Probably should consider simplified versions like microk8s and k3s for the local case. It's a lot simpler.
    2. Otherwise, go through Kubernetes the Hard Way, and the info is probably in there.

    And for the WSL2 side of the question, there are a number of answers on this question that may be able to configure the WSL2 to have its own LAN reachable IP address.

    Update:

    microk8s was very straightforward to install both on control nodes and worker nodes.

    The link above to the other stackoverflow is how to deal with the networking. Install Hyper-V and modify the WSL virtual switch as external. I'm guessing the people who couldn't get it to work might have run into some fiddly details to get it to work:

    1. WSL must have already been run after last boot, but all instances must be terminated (wsl --terminate {instancename})
    2. Must run hyper-v as adminstrator
    3. hyper-v must be started AFTER you terminated WSL.

    So, if it doesn't work, make sure WSL is terminated, and then restart hyper-v as admin. And sometimes there needs to be a certain delay after terminating WSL before running hyper-v. I think this can be automated, see: https://superuser.com/a/1790350/35726