Search code examples
amazon-web-serviceskubectlamazon-ekseksctl

AWS EKS - Failure creating load balancer controller


I am trying to create an application load balancer controller on my EKS cluster by following this link

When I run these steps (after making the necessary changes to the downloaded yaml file)

curl -o v2_1_2_full.yaml https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.1.2/docs/install/v2_1_2_full.yaml
kubectl apply -f v2_1_2_full.yaml

I get this output

customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws configured
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook configured
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role unchanged
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role configured
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding unchanged
service/aws-load-balancer-webhook-service unchanged
deployment.apps/aws-load-balancer-controller unchanged
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook configured
Error from server (InternalError): error when creating "v2_1_2_full.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: no endpoints available for service "cert-manager-webhook"
Error from server (InternalError): error when creating "v2_1_2_full.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: no endpoints available for service "cert-manager-webhook"

The load balancer controller doesnt appear to start up because of this and never gets to the ready state

Has anyone any suggestions on how to resolve this issue?


Solution

  • Turns out the taints on my nodegroup prevented the cert-manager pods from starting on any node.

    These commands helped debug and led me to a fix for this issue:

    kubectl get po -n cert-manager
    kubectl describe po <pod id> -n cert-manager
    

    My solution was to create another nodeGroup with no taints specified. This allowed the cert-manager to run.