I've been trying to get over this but I'm out of ideas for now hence I'm posting the question here.
I'm experimenting with the Oracle Cloud Infrastructure (OCI) and I wanted to create a Kubernetes cluster which exposes some service.
The goal is:
The infra looks the following:
And now it's the point where I want to expose the service running on port 3000.
I have 2 obvious choices:
The first one works perfectly fine but I want to use this cluster with minimal costs so I decided to experiment with option 2, the NLB since it's way cheaper (zero cost).
Long story short, everything works and I can access the NextJS app on the IP of the NLB most of the time but sometimes I couldn't. I decided to look it up what's going on and turned out the NodePort that I exposed in the cluster isn't working how I'd imagine.
The service behind the NodePort is only accessible on the Node that's running the pod in K8S. Assume NodeA is running the service and NodeB is just there chilling. If I try to hit the service on NodeA, everything is fine. But when I try to do the same on NodeB, I don't get a response at all.
That's my problem and I couldn't figure out what could be the issue.
What I've tried so far:
Edit: Some update. I've tried to use a DaemonSet instead of a regular Deployment for the pod to ensure that as a temporary solution, all nodes are running at least one instance of the pod and surprise. The node that was previously not responding to requests on that specific port, it still does not, even though a pod is running on it.
Edit2: Originally I was running the latest K8S version for the cluster (v1.21.5) and I tried downgrading to v1.20.11 and unfortunately the issue is still present.
Edit3: Checked if the NodePort is open on the node that's not responding and it is, at least kube-proxy is listening on it.
tcp 0 0 0.0.0.0:31600 0.0.0.0:* LISTEN 16671/kube-proxy
Edit4:: Tried adding whitelisting iptables rules but didn't change anything.
[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P FORWARD ACCEPT
[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P INPUT ACCEPT
[opc@oke-cdvpd5qrofa-nyx7mjtqw4a-svceq4qaiwq-0 ~]$ sudo iptables -P OUTPUT ACCEPT
Edit5: Just as a trial, I created a LoadBalancer once more to verify if I'm gone completely mental and I just didn't notice this error when I tried or it really works. Funny thing, it works perfectly fine through the classic load balancer's IP. But when I try to send a request to the nodes directly on the port that was opened for the load balancer (it's 30679 for now). I get response only from the node that's running the pod. From the other, still nothing yet through the load balancer, I get 100% successful responses.
Bonus, here's the iptables from the Node that's not responding to requests, not too sure what to look for:
[opc@oke-cn44eyuqdoq-n3ewna4fqra-sx5p5dalkuq-1 ~]$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
ACCEPT all -- 10.244.0.0/16 anywhere
ACCEPT all -- anywhere 10.244.0.0/16
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (2 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP all -- !loopback/8 loopback/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (2 references)
target prot opt source destination
Service spec (the running one since it was generated using Terraform):
{
"apiVersion": "v1",
"kind": "Service",
"metadata": {
"creationTimestamp": "2022-01-28T09:13:33Z",
"name": "web-staging-service",
"namespace": "web-staging",
"resourceVersion": "22542",
"uid": "c092f99b-7c72-4c32-bf27-ccfa1fe92a79"
},
"spec": {
"clusterIP": "10.96.99.112",
"clusterIPs": [
"10.96.99.112"
],
"externalTrafficPolicy": "Cluster",
"ipFamilies": [
"IPv4"
],
"ipFamilyPolicy": "SingleStack",
"ports": [
{
"nodePort": 31600,
"port": 3000,
"protocol": "TCP",
"targetPort": 3000
}
],
"selector": {
"app": "frontend"
},
"sessionAffinity": "None",
"type": "NodePort"
},
"status": {
"loadBalancer": {}
}
}
Any ideas are appreciated. Thanks guys.
Might not be the ideal fix, but can you try changing the externalTrafficPolicy to Local. This would prevent the health check on the nodes which don't run the application to fail. This way the traffic will only be forwarded to the node where the application is . Setting externalTrafficPolicy to local is also a requirement to preserve source IP of the connection. Also, can you share the health check config for both NLB and LB that you are using. When you change the externalTrafficPolicy, note that the health check for LB would change and the same needs to be applied to NLB.
Edit: Also note that you need a security list/ network security group added to your node subnet/nodepool, which allows traffic on all protocols from the worker node subnet.