Search code examples
amazon-web-serviceskubernetesamazon-ekscalico

Kubernetes API server failed to resolve webhook on EKS with Calico CNI


I'm trying to deploy an application in AWS EKS. I have created an EKS cluster with Calico CNI by following the official Calico documentation. I have also installed the AWS load balancer controller by following the docs here.

Here is my cluster, deployment, and ingress config file.

cluster.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: clustername
  region: us-east-2

nodeGroups:
  - name: ng1
    instanceType: t3.medium
    desiredCapacity: 1
    volumeSize: 30
    maxPodsPerNode: 250
    ami: auto
    ssh:
      publicKeyName: keyname

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: my_namspace
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: alexwhen/docker-2048
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: my_namspace
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048

ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my_namspace-ingress
  namespace: my_namspace
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
spec:
  rules:
    - host: domain.io
      http:
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
              name: service-2048
              port:
                number: 80

kubectl get pods --namespace kube-system -o wide

NAME                                            READY   STATUS    RESTARTS   AGE    IP              NODE                                          NOMINATED NODE   READINESS GATES
aws-load-balancer-controller-568d85bd58-6jpk5   1/1     Running   0          74m    172.16.22.4     ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
aws-load-balancer-controller-568d85bd58-ph44m   1/1     Running   0          74m    172.16.22.5     ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
calico-kube-controllers-6fd7b9848d-8lw4t        1/1     Running   0          91m    172.16.22.3     ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
calico-node-xdw2h                               1/1     Running   0          87m    192.168.32.46   ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
coredns-f47955f89-5qwh4                         1/1     Running   0          110m   172.16.22.2     ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
coredns-f47955f89-qfpbl                         1/1     Running   0          111m   172.16.22.1     ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>
kube-proxy-bnw6v                                1/1     Running   0          87m    192.168.32.46   ip-192-168-32-46.us-east-2.compute.internal   <none>           <none>

As you can see, everything is running smoothly. The problem is that when I tried to apply my ingress with kubectl apply -f ingress.yaml

Error from server (InternalError): error when creating "ingress-alb.yaml": Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1-ingress?timeout=10s": Address is not allowed

I have learned from here that this is a common problem with calico on EKS, and also tried to follow the provided solution of using hostNetwork: true in the deployment file as well as in the load balancer controller.

helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=clustername \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set hostNetwork=true

But the response is the same. Somehow the solution that worked for others doesn't work for me. Perhaps I missed something, I would really love to find that out.


Solution

  • I recommend that you should not use the Calico CNI method along with Amazon EKS service.

    Last year, I have spent time looking on this solution and contacted AWS specialist team on resolving a use case with this method. After a while, there is a big NOTE on Calico official site as below. I believe that EKS control plane (as well as GKE control plane), both cannot validate the NATed IP address of your pods. The IP range that both Cloud vendors managed control plane are within the VPC.

    enter image description here

    Reference: https://projectcalico.docs.tigera.io/getting-started/kubernetes/managed-public-cloud/eks