Search code examples
yamlaws-cloudformationamazon-vpcamazon-eksaws-cloudformation-custom-resource

Not able to join EKS Nodegroup into the existing EKScluster in AWS CloudFormation


In this template we are creating node groups that are to be deployed in the existing EKS cluster and VPC. The stack gets deployed successfully but I don't see the node groups inside my existing EKS cluster.

AWSTemplateFormatVersion: "2010-09-09"

Description: Amazon EKS - Node Group

Metadata:
  "AWS::CloudFormation::Interface":
    ParameterGroups:
      - Label:
          default: EKS Cluster
        Parameters:
          - ClusterName
          - ClusterControlPlaneSecurityGroup
      - Label:
          default: Worker Node Configuration
        Parameters:
          - NodeGroupName
          - NodeAutoScalingGroupMinSize
          - NodeAutoScalingGroupDesiredCapacity
          - NodeAutoScalingGroupMaxSize
          - NodeInstanceType
          - NodeImageIdSSMParam
          - NodeImageId
          - NodeVolumeSize
          - KeyName
          - BootstrapArguments
      - Label:
          default: Worker Network Configuration
        Parameters:
          - VpcId
          - Subnets

Parameters:
  BootstrapArguments:
    Type: String
    Default: ""
    Description: "Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami"

  ClusterControlPlaneSecurityGroup:
    Type: "AWS::EC2::SecurityGroup::Id"
    Description: The security group of the cluster control plane.

  ClusterName:
    Type: String
    Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster.

  KeyName:
    Type: "AWS::EC2::KeyPair::KeyName"
    Description: The EC2 Key Pair to allow SSH access to the instances

  NodeAutoScalingGroupDesiredCapacity:
    Type: Number
    Default: 3
    Description: Desired capacity of Node Group ASG.

  NodeAutoScalingGroupMaxSize:
    Type: Number
    Default: 4
    Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.

  NodeAutoScalingGroupMinSize:
    Type: Number
    Default: 1
    Description: Minimum size of Node Group ASG.

  NodeGroupName:
    Type: String
    Description: Unique identifier for the Node Group.

  NodeImageId:
    Type: String
    Default: ""
    Description: (Optional) Specify your own custom image ID. This value overrides any AWS Systems Manager Parameter Store value specified above.

  NodeImageIdSSMParam:
    Type: "AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>"
    Default: /aws/service/eks/optimized-ami/1.14/amazon-linux-2/recommended/image_id
    Description: AWS Systems Manager Parameter Store parameter of the AMI ID for the worker node instances.

  NodeInstanceType:
    Type: String
    Default: t3.medium
    AllowedValues:
      - a1.medium
      - a1.large
      - a1.xlarge
      - a1.2xlarge
      - a1.4xlarge
      - c1.medium
      - c1.xlarge
      - c3.large
      - c3.xlarge
      - c3.2xlarge
      - c3.4xlarge
      - c3.8xlarge
      - c4.large
      - c4.xlarge
      - c4.2xlarge
      - c4.4xlarge
      - c4.8xlarge
      - c5.large
      - c5.xlarge
      - c5.2xlarge
      - c5.4xlarge
      - c5.9xlarge
      - c5.12xlarge
      - c5.18xlarge
      - c5.24xlarge
      - c5.metal
      - c5d.large
      - c5d.xlarge
      - c5d.2xlarge
      - c5d.4xlarge
      - c5d.9xlarge
      - c5d.18xlarge
      - c5n.large
      - c5n.xlarge
      - c5n.2xlarge
      - c5n.4xlarge
      - c5n.9xlarge
      - c5n.18xlarge
      - cc2.8xlarge
      - cr1.8xlarge
      - d2.xlarge
      - d2.2xlarge
      - d2.4xlarge
      - d2.8xlarge
      - f1.2xlarge
      - f1.4xlarge
      - f1.16xlarge
      - g2.2xlarge
      - g2.8xlarge
      - g3s.xlarge
      - g3.4xlarge
      - g3.8xlarge
      - g3.16xlarge
      - h1.2xlarge
      - h1.4xlarge
      - h1.8xlarge
      - h1.16xlarge
      - hs1.8xlarge
      - i2.xlarge
      - i2.2xlarge
      - i2.4xlarge
      - i2.8xlarge
      - i3.large
      - i3.xlarge
      - i3.2xlarge
      - i3.4xlarge
      - i3.8xlarge
      - i3.16xlarge
      - i3.metal
      - i3en.large
      - i3en.xlarge
      - i3en.2xlarge
      - i3en.3xlarge
      - i3en.6xlarge
      - i3en.12xlarge
      - i3en.24xlarge
      - m1.small
      - m1.medium
      - m1.large
      - m1.xlarge
      - m2.xlarge
      - m2.2xlarge
      - m2.4xlarge
      - m3.medium
      - m3.large
      - m3.xlarge
      - m3.2xlarge
      - m4.large
      - m4.xlarge
      - m4.2xlarge
      - m4.4xlarge
      - m4.10xlarge
      - m4.16xlarge
      - m5.large
      - m5.xlarge
      - m5.2xlarge
      - m5.4xlarge
      - m5.8xlarge
      - m5.12xlarge
      - m5.16xlarge
      - m5.24xlarge
      - m5.metal
      - m5a.large
      - m5a.xlarge
      - m5a.2xlarge
      - m5a.4xlarge
      - m5a.8xlarge
      - m5a.12xlarge
      - m5a.16xlarge
      - m5a.24xlarge
      - m5ad.large
      - m5ad.xlarge
      - m5ad.2xlarge
      - m5ad.4xlarge
      - m5ad.12xlarge
      - m5ad.24xlarge
      - m5d.large
      - m5d.xlarge
      - m5d.2xlarge
      - m5d.4xlarge
      - m5d.8xlarge
      - m5d.12xlarge
      - m5d.16xlarge
      - m5d.24xlarge
      - m5d.metal
      - p2.xlarge
      - p2.8xlarge
      - p2.16xlarge
      - p3.2xlarge
      - p3.8xlarge
      - p3.16xlarge
      - p3dn.24xlarge
      - g4dn.xlarge
      - g4dn.2xlarge
      - g4dn.4xlarge
      - g4dn.8xlarge
      - g4dn.12xlarge
      - g4dn.16xlarge
      - g4dn.metal
      - r3.large
      - r3.xlarge
      - r3.2xlarge
      - r3.4xlarge
      - r3.8xlarge
      - r4.large
      - r4.xlarge
      - r4.2xlarge
      - r4.4xlarge
      - r4.8xlarge
      - r4.16xlarge
      - r5.large
      - r5.xlarge
      - r5.2xlarge
      - r5.4xlarge
      - r5.8xlarge
      - r5.12xlarge
      - r5.16xlarge
      - r5.24xlarge
      - r5.metal
      - r5a.large
      - r5a.xlarge
      - r5a.2xlarge
      - r5a.4xlarge
      - r5a.8xlarge
      - r5a.12xlarge
      - r5a.16xlarge
      - r5a.24xlarge
      - r5ad.large
      - r5ad.xlarge
      - r5ad.2xlarge
      - r5ad.4xlarge
      - r5ad.12xlarge
      - r5ad.24xlarge
      - r5d.large
      - r5d.xlarge
      - r5d.2xlarge
      - r5d.4xlarge
      - r5d.8xlarge
      - r5d.12xlarge
      - r5d.16xlarge
      - r5d.24xlarge
      - r5d.metal
      - t1.micro
      - t2.nano
      - t2.micro
      - t2.small
      - t2.medium
      - t2.large
      - t2.xlarge
      - t2.2xlarge
      - t3.nano
      - t3.micro
      - t3.small
      - t3.medium
      - t3.large
      - t3.xlarge
      - t3.2xlarge
      - t3a.nano
      - t3a.micro
      - t3a.small
      - t3a.medium
      - t3a.large
      - t3a.xlarge
      - t3a.2xlarge
      - u-6tb1.metal
      - u-9tb1.metal
      - u-12tb1.metal
      - x1.16xlarge
      - x1.32xlarge
      - x1e.xlarge
      - x1e.2xlarge
      - x1e.4xlarge
      - x1e.8xlarge
      - x1e.16xlarge
      - x1e.32xlarge
      - z1d.large
      - z1d.xlarge
      - z1d.2xlarge
      - z1d.3xlarge
      - z1d.6xlarge
      - z1d.12xlarge
      - z1d.metal
    ConstraintDescription: Must be a valid EC2 instance type
    Description: EC2 instance type for the node instances

  NodeVolumeSize:
    Type: Number
    Default: 20
    Description: Node volume size

  Subnets:
    Type: "List<AWS::EC2::Subnet::Id>"
    Description: The subnets where workers can be created.

  VpcId:
    Type: "AWS::EC2::VPC::Id"
    Description: The VPC of the worker instances

Conditions:
  HasNodeImageId: !Not
    - "Fn::Equals":
        - Ref: NodeImageId
        - ""

Resources:
  NodeInstanceRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
        - "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      Path: /

  NodeInstanceProfile:
    Type: "AWS::IAM::InstanceProfile"
    Properties:
      Path: /
      Roles:
        - Ref: NodeInstanceRole

  NodeSecurityGroup:
    Type: "AWS::EC2::SecurityGroup"
    Properties:
      GroupDescription: Security group for all nodes in the cluster
      Tags:
        - Key: !Sub kubernetes.io/cluster/${ClusterName}
          Value: owned
      VpcId: !Ref VpcId

  NodeSecurityGroupIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow node to communicate with each other
      FromPort: 0
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: "-1"
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      ToPort: 65535

  ClusterControlPlaneSecurityGroupIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods to communicate with the cluster API Server
      FromPort: 443
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      ToPort: 443

  ControlPlaneEgressToNodeSecurityGroup:
    Type: "AWS::EC2::SecurityGroupEgress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with worker Kubelet and pods
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      FromPort: 1025
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      ToPort: 65535

  ControlPlaneEgressToNodeSecurityGroupOn443:
    Type: "AWS::EC2::SecurityGroupEgress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      FromPort: 443
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      ToPort: 443

  NodeSecurityGroupFromControlPlaneIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      FromPort: 1025
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneOn443Ingress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
      FromPort: 443
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      ToPort: 443

Problem seems to be over here

  NodeLaunchConfig:
    Type: "AWS::AutoScaling::LaunchConfiguration"
    Properties:
      AssociatePublicIpAddress: "true"
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            DeleteOnTermination: true
            VolumeSize: !Ref NodeVolumeSize
            VolumeType: gp2
      IamInstanceProfile: !Ref NodeInstanceProfile
      ImageId: !If
        - HasNodeImageId
        - Ref: NodeImageId
        - Ref: NodeImageIdSSMParam
      InstanceType: !Ref NodeInstanceType
      KeyName: !Ref KeyName
      SecurityGroups:
        - Ref: NodeSecurityGroup
      UserData: !Base64
        "Fn::Sub": |
          #!/bin/bash
          set -o xtrace
          /etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}
          /opt/aws/bin/cfn-signal --exit-code $? \
                   --stack  ${AWS::StackName} \
                   --resource NodeGroup  \
                   --region ${AWS::Region}

May be over here

 NodeGroup:
    Type: "AWS::AutoScaling::AutoScalingGroup"
    Properties:
      DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
      LaunchConfigurationName: !Ref NodeLaunchConfig
      MaxSize: !Ref NodeAutoScalingGroupMaxSize
      MinSize: !Ref NodeAutoScalingGroupMinSize
      Tags:
        - Key: Name
          PropagateAtLaunch: "true"
          Value: !Sub ${ClusterName}-${NodeGroupName}-Node
        - Key: !Sub kubernetes.io/cluster/${ClusterName}
          PropagateAtLaunch: "true"
          Value: owned
      VPCZoneIdentifier: !Ref Subnets
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: "1"
        MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
        PauseTime: PT5M

Outputs:
  NodeInstanceRole:
    Description: The node instance role
    Value: !GetAtt NodeInstanceRole.Arn

  NodeSecurityGroup:
    Description: The security group for the node group
    Value: !Ref NodeSecurityGroup

Well, though the template is getting deployed but the nodegroups aren't visible in my EKS Cluster. Please do let me know if there are any updations to be made so that the nodegroups get deployed in the cluster.


Solution

  • Okay, I was going through the same problem. The problem is with the type which you chosen for the nodegroup. It should be of AWS::EKS::Nodegroup. You have chosen the wrong type. Change it and your nodegroup will be visible in the cluster.

    Here is the link for the same: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html