Search code examples
kubernetesoracle-cloud-infrastructure

Kubernetes NodePort is not available on all nodes - Oracle Cloud Infrastructure (OCI)


I've been trying to get over this but I'm out of ideas for now hence I'm posting the question here.

I'm experimenting with the Oracle Cloud Infrastructure (OCI) and I wanted to create a Kubernetes cluster which exposes some service.

The goal is:

  • A running managed Kubernetes cluster (OKE)
  • 2 nodes at least
  • 1 service that's accessible for external parties

The infra looks the following:

  • A VCN for the whole thing
  • A private subnet on 10.0.1.0/24
  • A public subnet on 10.0.0.0/24
  • NAT gateway for the private subnet
  • Internet gateway for the public subnet
  • Service gateway
  • The corresponding security lists for both subnets which I won't share right now unless somebody asks for it
  • A containerengine K8S (OKE) cluster in the VCN with public Kubernetes API enabled
  • A node pool for the K8S cluster with 2 availability domains and with 2 instances right now. The instances are ARM machines with 1 OCPU and 6GB RAM running Oracle-Linux-7.9-aarch64-2021.12.08-0 images.
  • A namespace in the K8S cluster (call it staging for now)
  • A deployment which refers to a custom NextJS application serving traffic on port 3000

And now it's the point where I want to expose the service running on port 3000.

I have 2 obvious choices:

  • Create a LoadBalancer service in K8S which will spawn a classic Load Balancer in OCI, set up it's listener and set up the backendset referring to the 2 nodes in the cluster, plus it adjusts the subnet security lists to make sure traffic can flow
  • Create a Network Load Balancer in OCI and create a NodePort on K8S and manually configure the NLB to the ~same settings as the classic Load Balancer

The first one works perfectly fine but I want to use this cluster with minimal costs so I decided to experiment with option 2, the NLB since it's way cheaper (zero cost).

Long story short, everything works and I can access the NextJS app on the IP of the NLB most of the time but sometimes I couldn't. I decided to look it up what's going on and turned out the NodePort that I exposed in the cluster isn't working how I'd imagine.

The service behind the NodePort is only accessible on the Node that's running the pod in K8S. Assume NodeA is running the service and NodeB is just there chilling. If I try to hit the service on NodeA, everything is fine. But when I try to do the same on NodeB, I don't get a response at all.

That's my problem and I couldn't figure out what could be the issue.

What I've tried so far:

  • Switching from ARM machines to AMD ones - no change
  • Created a bastion host in the public subnet to test which nodes are responding to requests. Turned out only the node responds that's running the pod.
  • Created a regular LoadBalancer in K8S with the same config as the NodePort (in this case OCI will create a classic Load Balancer), that works perfectly
  • Tried upgrading to Oracle 8.4 images for the K8S nodes, didn't fix it
  • Ran the Node Doctor on the nodes, everything is fine
  • Checked the logs of kube-proxy, kube-flannel, core-dns, no error
  • Since the cluster consists of 2 nodes, I gave it a try and added one more node and the service was not accessible on the new node either
  • Recreated the cluster from scratch

Edit: Some update. I've tried to use a DaemonSet instead of a regular Deployment for the pod to ensure that as a temporary solution, all nodes are running at least one instance of the pod and surprise. The node that was previously not responding to requests on that specific port, it still does not, even though a pod is running on it.

Edit2: Originally I was running the latest K8S version for the cluster (v1.21.5) and I tried downgrading to v1.20.11 and unfortunately the issue is still present.

Edit3: Checked if the NodePort is open on the node that's not responding and it is, at least kube-proxy is listening on it.

tcp        0      0 0.0.0.0:31600           0.0.0.0:*               LISTEN      16671/kube-proxy

Edit4:: Tried adding whitelisting iptables rules but didn't change anything.

[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P FORWARD ACCEPT
[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P INPUT ACCEPT
[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P OUTPUT ACCEPT

Edit5: Just as a trial, I created a LoadBalancer once more to verify if I'm gone completely mental and I just didn't notice this error when I tried or it really works. Funny thing, it works perfectly fine through the classic load balancer's IP. But when I try to send a request to the nodes directly on the port that was opened for the load balancer (it's 30679 for now). I get response only from the node that's running the pod. From the other, still nothing yet through the load balancer, I get 100% successful responses.

Bonus, here's the iptables from the Node that's not responding to requests, not too sure what to look for:

[opc@oke-cn44eyuqdoq-n3ewna4fqra-sx5p5dalkuq-1 ~]$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-NODEPORTS  all  --  anywhere             anywhere             /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
ACCEPT     all  --  10.244.0.0/16        anywhere
ACCEPT     all  --  anywhere             10.244.0.0/16

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere

Chain KUBE-EXTERNAL-SERVICES (2 references)
target     prot opt source               destination

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP       all  -- !loopback/8           loopback/8           /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination

Service spec (the running one since it was generated using Terraform):

{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "creationTimestamp": "2022-01-28T09:13:33Z",
        "name": "web-staging-service",
        "namespace": "web-staging",
        "resourceVersion": "22542",
        "uid": "c092f99b-7c72-4c32-bf27-ccfa1fe92a79"
    },
    "spec": {
        "clusterIP": "10.96.99.112",
        "clusterIPs": [
            "10.96.99.112"
        ],
        "externalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "nodePort": 31600,
                "port": 3000,
                "protocol": "TCP",
                "targetPort": 3000
            }
        ],
        "selector": {
            "app": "frontend"
        },
        "sessionAffinity": "None",
        "type": "NodePort"
    },
    "status": {
        "loadBalancer": {}
    }
}

Any ideas are appreciated. Thanks guys.


Solution

  • Might not be the ideal fix, but can you try changing the externalTrafficPolicy to Local. This would prevent the health check on the nodes which don't run the application to fail. This way the traffic will only be forwarded to the node where the application is . Setting externalTrafficPolicy to local is also a requirement to preserve source IP of the connection. Also, can you share the health check config for both NLB and LB that you are using. When you change the externalTrafficPolicy, note that the health check for LB would change and the same needs to be applied to NLB.

    Edit: Also note that you need a security list/ network security group added to your node subnet/nodepool, which allows traffic on all protocols from the worker node subnet.