Search code examples
dockergokubernetesamazon-eksamazon-ecr

Image from ECR to EKS not working as resulting pods is always 0/2


I have tried almost all the ways to get things on correct path, but still not able to get my pods in available state.

So I have a basic application written in go.

I have created an image of the program using docker build --tag docker-gs-ping . Then I tried to run the same inside a container docker run --publish 8080:8080 docker-gs-ping

Then I thought to save my image to Amazon ECR, for that I created a Repository in ECR.

enter image description here

Now after the repository was created then I tagged my image that is present in local.

docker tag f49366b7f534 ****40312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest

f49366b7f534 is the image tag in my local. docker-gs-ping is the repository name in ECR.

I then uploaded the tagged image to ECR using command.

docker push ****40312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest

Not sure if the above command will push the tagged image or the recent image from local as there is no way to mention a particular image to push to ECR.

The result so far is enter image description here enter image description here

After the above steps I created a VPS using the files and command as below:

EKS stack:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS Cluster'

Parameters:
  ClusterName:
    Type: String
    Default: my-eks-cluster
  NumberOfWorkerNodes:
    Type: Number
    Default: 1
  WorkerNodesInstanceType:
    Type: String
    Default: t2.micro
  KubernetesVersion:
    Type: String
    Default: 1.22
    
Resources:

  ###########################################
  ## Roles
  ###########################################
  EksRole:
    Type: AWS::IAM::Role
    Properties: 
      RoleName: my.eks.cluster.role
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - eks.amazonaws.com
            Action:
              - sts:AssumeRole
      Path: /
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  EksNodeRole:
    Type: AWS::IAM::Role
    Properties: 
      RoleName: my.eks.node.role
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - sts:AssumeRole
      Path: /
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
        - "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"

  ###########################################
  ## Eks Cluster
  ###########################################

  EksCluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: !Ref ClusterName
      Version: !Ref KubernetesVersion
      RoleArn: !GetAtt EksRole.Arn
      ResourcesVpcConfig:
        SecurityGroupIds:
          - !ImportValue ControlPlaneSecurityGroupId
        SubnetIds: !Split [ ',', !ImportValue PrivateSubnetIds ]

  EksNodegroup:
    Type: AWS::EKS::Nodegroup
    DependsOn: EksCluster
    Properties:
      ClusterName: !Ref ClusterName
      NodeRole: !GetAtt EksNodeRole.Arn
      ScalingConfig:
        MinSize:
          Ref: NumberOfWorkerNodes
        DesiredSize:
          Ref: NumberOfWorkerNodes
        MaxSize:
          Ref: NumberOfWorkerNodes
      Subnets: !Split [ ',', !ImportValue PrivateSubnetIds ]

Command: aws cloudformation create-stack --region us-east-1 --stack-name my-eks-cluster --capabilities CAPABILITY_NAMED_IAM --template-body file://eks-stack.yaml

EKS VPC YAML

    ---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS VPC - Private and Public subnets'

Parameters:

  VpcBlock:
    Type: String
    Default: 192.168.0.0/16
    Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.

  PublicSubnet01Block:
    Type: String
    Default: 192.168.0.0/18
    Description: CidrBlock for public subnet 01 within the VPC

  PublicSubnet02Block:
    Type: String
    Default: 192.168.64.0/18
    Description: CidrBlock for public subnet 02 within the VPC

  PrivateSubnet01Block:
    Type: String
    Default: 192.168.128.0/18
    Description: CidrBlock for private subnet 01 within the VPC

  PrivateSubnet02Block:
    Type: String
    Default: 192.168.192.0/18
    Description: CidrBlock for private subnet 02 within the VPC

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      -
        Label:
          default: "Worker Network Configuration"
        Parameters:
          - VpcBlock
          - PublicSubnet01Block
          - PublicSubnet02Block
          - PrivateSubnet01Block
          - PrivateSubnet02Block

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock:  !Ref VpcBlock
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
      - Key: Name
        Value: !Sub '${AWS::StackName}-VPC'

  InternetGateway:
    Type: "AWS::EC2::InternetGateway"

  VPCGatewayAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref InternetGateway
      VpcId: !Ref VPC

  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Public Subnets
      - Key: Network
        Value: Public

  PrivateRouteTable01:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Private Subnet AZ1
      - Key: Network
        Value: Private01

  PrivateRouteTable02:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Private Subnet AZ2
      - Key: Network
        Value: Private02

  PublicRoute:
    DependsOn: VPCGatewayAttachment
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  PrivateRoute01:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway01
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable01
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway01

  PrivateRoute02:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway02
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable02
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway02

  NatGateway01:
    DependsOn:
    - NatGatewayEIP1
    - PublicSubnet01
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP1.AllocationId'
      SubnetId: !Ref PublicSubnet01
      Tags:
      - Key: Name
        Value: !Sub '${AWS::StackName}-NatGatewayAZ1'

  NatGateway02:
    DependsOn:
    - NatGatewayEIP2
    - PublicSubnet02
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId'
      SubnetId: !Ref PublicSubnet02
      Tags:
      - Key: Name
        Value: !Sub '${AWS::StackName}-NatGatewayAZ2'

  NatGatewayEIP1:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc

  NatGatewayEIP2:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc

  PublicSubnet01:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 01
    Properties:
      MapPublicIpOnLaunch: true
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: PublicSubnet01Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${AWS::StackName}-PublicSubnet01"
      - Key: kubernetes.io/role/elb
        Value: 1

  PublicSubnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 02
    Properties:
      MapPublicIpOnLaunch: true
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: PublicSubnet02Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${AWS::StackName}-PublicSubnet02"
      - Key: kubernetes.io/role/elb
        Value: 1

  PrivateSubnet01:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 03
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: PrivateSubnet01Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${AWS::StackName}-PrivateSubnet01"
      - Key: kubernetes.io/role/internal-elb
        Value: 1

  PrivateSubnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Private Subnet 02
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: PrivateSubnet02Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${AWS::StackName}-PrivateSubnet02"
      - Key: kubernetes.io/role/internal-elb
        Value: 1

  PublicSubnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnet01
      RouteTableId: !Ref PublicRouteTable

  PublicSubnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnet02
      RouteTableId: !Ref PublicRouteTable

  PrivateSubnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PrivateSubnet01
      RouteTableId: !Ref PrivateRouteTable01

  PrivateSubnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PrivateSubnet02
      RouteTableId: !Ref PrivateRouteTable02

  ControlPlaneSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Cluster communication with worker nodes
      VpcId: !Ref VPC

Outputs:

  PublicSubnetIds:
    Description: Public Subnets IDs in the VPC
    Value: !Join [ ",", [ !Ref PublicSubnet01, !Ref PublicSubnet02 ] ]
    Export:
      Name: PublicSubnetIds
  
  PrivateSubnetIds:
    Description: Private Subnets IDs in the VPC
    Value: !Join [ ",", [ !Ref PrivateSubnet01, !Ref PrivateSubnet02 ] ]
    Export:
      Name: PrivateSubnetIds

  ControlPlaneSecurityGroupId:
    Description: Security group for the cluster control plane communication with worker nodes
    Value: !Ref ControlPlaneSecurityGroup
    Export:
      Name: ControlPlaneSecurityGroupId

  VpcId:
    Description: The VPC Id
    Value: !Ref VPC
    Export:
      Name: VpcId

Command: aws cloudformation create-stack --region us-east-1 --stack-name my-eks-vpc --template-body file://eks-vpc-stack.yaml

Result after the commands: enter image description here

Now I tried to deploy the deployment.yaml and service.yaml file

Deployment.yaml

    apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: helloworld
  template:
    metadata:
      labels:
        app: helloworld
    spec:
      containers:
        - name: new-container
          image: ****40312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest
          ports:
            - containerPort: 80

Command and result: enter image description here

Now service.yaml

    apiVersion: v1
kind: Service
metadata:
  name: helloworld
spec:
  type: LoadBalancer
  selector:
    app: helloworld
  ports:
    - name: http
      port: 80
      targetPort: 80

command and result:

enter image description here

After all this when I run kubectl get deployments, I get as below:

enter image description here

To debug I tried kubectl describe pod helloworld, I get as below

C:\Users\visratna\GolandProjects\testaws>kubectl describe pod helloworld
Name:             helloworld-c6dc56598-jmpvr
Namespace:        default
Priority:         0
Service Account:  default
Node:             docker-desktop/192.168.65.4
Start Time:       Fri, 07 Jul 2023 22:22:18 +0530
Labels:           app=helloworld
                  pod-template-hash=c6dc56598
Annotations:      <none>
Status:           Pending
IP:               10.1.0.7
IPs:
  IP:           10.1.0.7
Controlled By:  ReplicaSet/helloworld-c6dc56598
Containers:
  new-container:
    Container ID:
    Image:          549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sldvv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-sldvv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  23m                   default-scheduler  Successfully assigned default/helloworld-c6dc56598-jmpvr to docker-desktop
  Normal   Pulling    22m (x4 over 23m)     kubelet            Pulling image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest"
  Warning  Failed     22m (x4 over 23m)     kubelet            Failed to pull image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest": rpc error: code = Unknown desc = Error response from daemon: Head "https://549840312665.dkr.ecr.us-east-1.amazonaws.com/v2/docker-gs-ping/manifests/latest": no basic auth credentials
  Warning  Failed     22m (x4 over 23m)     kubelet            Error: ErrImagePull
  Warning  Failed     22m (x6 over 23m)     kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m47s (x85 over 23m)  kubelet            Back-off pulling image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest"

Name:             helloworld-c6dc56598-r9b4d
Namespace:        default
Priority:         0
Service Account:  default
Node:             docker-desktop/192.168.65.4
Start Time:       Fri, 07 Jul 2023 22:22:18 +0530
Labels:           app=helloworld
                  pod-template-hash=c6dc56598
Annotations:      <none>
Status:           Pending
IP:               10.1.0.6
IPs:
  IP:           10.1.0.6
Controlled By:  ReplicaSet/helloworld-c6dc56598
Containers:
  new-container:
    Container ID:
    Image:          549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-84rw4 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-84rw4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  23m                   default-scheduler  Successfully assigned default/helloworld-c6dc56598-r9b4d to docker-desktop
  Normal   Pulling    22m (x4 over 23m)     kubelet            Pulling image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest"
  Warning  Failed     22m (x4 over 23m)     kubelet            Failed to pull image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest": rpc error: code = Unknown desc = Error response from daemon: Head "https://549840312665.dkr.ecr.us-east-1.amazonaws.com/v2/docker-gs-ping/manifests/latest": no basic auth credentials
  Warning  Failed     22m (x4 over 23m)     kubelet            Error: ErrImagePull
  Warning  Failed     22m (x6 over 23m)     kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m43s (x86 over 23m)  kubelet            Back-off pulling image "549840312665.dkr.ecr.us-east-1.amazonaws.com/docker-gs-ping:latest"

I have tried many oslutions as suggested over stackoverflow, but nothing seems working for me, any suggestions how I can gets things working?? A very thank you in advance.


Solution

  • A couple things. First, you should avoid using the latest tag. It is an anti-pattern. When you push the image to ECR use the build tag or a version number as the image tag. Second, you need to verify that your worker nodes have permission to pull images from ECR, specifically, the AmazonEC2ContainerRegistryReadOnly policy. Without that, the kubelet will be unable to pull images from ECR. If the registry is in a different account than the cluster, you need to create a repository [resource] policy. See https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policies.html.