Search code examples
amazon-web-serviceskuberneteskubernetes-poddocker-registryaws-fargate

AWS EKS fargate coredns ImagePullBackOff


I'm trying to deploy a simple tutorial app to a new fargate based kubernetes cluster.

Unfortunately I'm stuck on ImagePullBackOff for the coredns pod:

Events:
  Type     Reason           Age                  From               Message
  ----     ------           ----                 ----               -------
  Warning  LoggingDisabled  5m51s                fargate-scheduler  Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
  Normal   Scheduled        4m11s                fargate-scheduler  Successfully assigned kube-system/coredns-86cb968586-mcdpj to fargate-ip-172-31-55-205.eu-central-1.compute.internal
  Warning  Failed           100s                 kubelet            Failed to pull image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": rpc error: code = Unknown desc = failed to pull and unpack image "602

401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": failed to resolve reference "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1": failed to do request: Head "https://602401143452.dkr.

ecr.eu-central-1.amazonaws.com/v2/eks/coredns/manifests/v1.8.0-eksbuild.1": dial tcp 3.122.9.124:443: i/o timeout
  Warning  Failed           100s                 kubelet            Error: ErrImagePull
  Normal   BackOff          99s                  kubelet            Back-off pulling image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1"
  Warning  Failed           99s                  kubelet            Error: ImagePullBackOff
  Normal   Pulling          87s (x2 over 4m10s)  kubelet            Pulling image "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1"

While googling I found https://aws.amazon.com/premiumsupport/knowledge-center/eks-ecr-troubleshooting/ It contains a following list:

To resolve this error, confirm the following:

 - The subnet for your worker node has a route to the internet. Check the route table associated with your subnet.
 - The security group associated with your worker node allows outbound internet traffic.
 - The ingress and egress rule for your network access control lists (ACLs) allows access to the internet.

Since I've created both my private subnets as well as their NAT Gateways manually I tried to locate an issue here but couldn't find anything. They as well as security groups and ACLs look fine to me.

enter image description here

I even added the AmazonEC2ContainerRegistryReadOnly to my EKS role but after issuing command kubectl rollout restart -n kube-system deployment coredns the result is unfortunately the same: ImagePullBackOff

Unfortunately I've runned out of ideas and I'm stuck. Any help that would help me troubleshoot this would be greatly appreciated. ~Thanks


edit>

After creating new cluster via *eksctl as @mreferre suggested in his comment I get RBAC error with link: https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting_iam.html#security-iam-troubleshoot-cannot-view-nodes-or-workloads
RBAC error

I'm not sure what is going on since I already have Full policy on my IAM user


edit>>

The cluster created via AWS Console ( web interface ) doesn't have the configmap aws-auth I've retrieved the configmap below using command kubectl edit configmap aws-auth -n kube-system

apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      - system:node-proxier
      rolearn: arn:aws:iam::370179080679:role/eksctl-tutorial-cluster-FargatePodExecutionRole-1J605HWNTGS2Q
      username: system:node:{{SessionName}}
kind: ConfigMap
metadata:
  creationTimestamp: "2021-04-08T18:42:59Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "918"
  selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth
  uid: d9a21964-a8bf-49e9-800f-650320b7444e

Solution

  • Creating an answer to sum up the discussion in the comment that deemed to be acceptable. The most common (and arguably easier) way to setup an EKS cluster with Fargate support is to use EKSCTL and setup the cluster using eksctl create cluster --fargate. This will build all the plumbing for you and you will get a cluster with no EC2 instances nor managed node groups with the two CoreDNS pods deployed on two Fargate instances. Note that when you deploy EKSCTL via the command line you may end up using different roles/users between your CLI and console. This may result in access denied issues. Best course of action would be to use a non-root user to login into the AWS console and use CloudShell to deploy with EKSCTL (CloudShell will inherit the same console user identity). {More info in the comments}