Search code examples
amazon-web-serviceskubernetesamazon-ekskops

Target health check fails - AWS Network Load Balancer


NOTE: I tried to include screenshots but stackoverflow does not allow me to add images with preview so I included them as links.

I deployed a web app on AWS using kOps. I have two nodes and set up a Network Load Balancer.

enter image description here The target group of the NLB has two nodes (each node is an instance made from the same template).

enter image description here Load balancer actually seems to be working after checking ingress-nginx-controller logs. The requests are being distributed over pods correctly. And I can access the service via ingress external address. But when I go to AWS Console / Target Group, one of the two nodes is marked as and I am concerned with that.

Nodes are running correctly. enter image description here

I tried to execute sh into nginx-controller and tried curl to both nodes with their internal IP address. For the healthy node, I get nginx response and for the unhealthy node, it times out. I do not know how nginx was installed on one of the nodes and not on the other one.

Could anybody let me know the possible reasons?


Solution

  • I had exactly the same problem before and this should be documented somewhere on AWS or Kubernetes. The answer is copied from AWS Premium Support

    Short description

    The NGINX Ingress Controller sets the spec.externalTrafficPolicy option to Local to preserve the client IP. Also, requests aren't routed to unhealthy worker nodes. The following troubleshooting implies that you don't need to maintain the cluster IP address or preserve the client IP address.

    Resolution

    If you check the ingress controller service you will see the External Traffic Policy field set to Local.

    $ kubectl -n ingress-nginx describe svc ingress-nginx-controller
    
    Output:
    
    Name:                     ingress-nginx-controller
    Namespace:                ingress-nginx
    ...
    External Traffic Policy:  Local
    ...
    

    This Local setting drops packets that are sent to Kubernetes nodes that aren't running instances of the NGINX Ingress Controller. Assign NGINX pods (from the Kubernetes website) to the nodes that you want to schedule the NGINX Ingress Controller on.

    Update the pec.externalTrafficPolicy option to Cluster

    $ kubectl -n ingress-nginx patch service ingress-nginx-controller -p '{"spec":{"externalTrafficPolicy":"Cluster"}}'
    
    Output:
    service/ingress-nginx-controller patched
    

    By default, NodePort services perform source address translation (from the Kubernetes website). For NGINX, this means that the source IP of an HTTP request is always the IP address of the Kubernetes node that received the request. If you set a NodePort to the value of the externalTrafficPolicy field in the ingress-nginx service specification to Cluster, then you can't maintain the source IP address.