I am managing the infrastructure on AWS using Cloudformation Script. An EKS cluster with managed nodegroups is created and the instance type is m5a.16xlarge(64 cores,256gb ram). Now the maximum number of pods the instance can have is 737 which is based upon max ENIs and number of Ip addresses in the subnetwork. I need to restrict the number of pods on my managed nodegroup such that it does not go beyond a specific custom count. What should I do to achieve that?
I finally found the answer to this. We can restrict the number of pods on a specific eks cluster by using Custom AMI's for worker nodes.
Here is the link for creating the custom AMI:
https://aws.amazon.com/premiumsupport/knowledge-center/eks-custom-linux-ami/
This link provides a way to clone a repository into your local machine, customise it and then create the ami in your aws account. The local machine or any ec2 instance where the repo is being cloned must be configured with aws cli latest version.
Once the repo is cloned we can edit the configuration of /sampleDir/files/eni-max-pods.txt file which consists of all the supported instances with their max pods value. /sampleDir/ is the directory where we are cloning the repository.
After this we need to run the 'make' command and it will automatically create the custom ami in any given region. We can then use a cloudformation template to launch managed nodes or self managed nodes on eks.
Here is a sample cloudformation template.
Note: We can use the same CF Template for managed nodegroups by editing the nodegroup section with type: Aws::EKS::Nodegroup
Description: Amazon EKS - Node Group
Metadata:
"AWS::CloudFormation::Interface":
ParameterGroups:
- Label:
default: EKS Cluster
Parameters:
- ClusterName
- ClusterControlPlaneSecurityGroup
- Label:
default: Worker Node Configuration
Parameters:
- NodeGroupName
- NodeAutoScalingGroupMinSize
- NodeAutoScalingGroupDesiredCapacity
- NodeAutoScalingGroupMaxSize
- NodeInstanceType
- NodeImageIdSSMParam
- NodeImageId
- NodeVolumeSize
- KeyName
- BootstrapArguments
- DisableIMDSv1
- Label:
default: Worker Network Configuration
Parameters:
- VpcId
- Subnets
Parameters:
BootstrapArguments:
Type: String
Default: ""
Description: "Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami"
ClusterControlPlaneSecurityGroup:
Type: "AWS::EC2::SecurityGroup::Id"
Description: The security group of the cluster control plane.
ClusterName:
Type: String
Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster.
KeyName:
Type: "AWS::EC2::KeyPair::KeyName"
Description: The EC2 Key Pair to allow SSH access to the instances
NodeAutoScalingGroupDesiredCapacity:
Type: Number
Default: 3
Description: Desired capacity of Node Group ASG.
NodeAutoScalingGroupMaxSize:
Type: Number
Default: 4
Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
NodeAutoScalingGroupMinSize:
Type: Number
Default: 1
Description: Minimum size of Node Group ASG.
NodeGroupName:
Type: String
Description: Unique identifier for the Node Group.
NodeImageId:
Type: String
Default: ""
Description: (Optional) Specify your own custom image ID. This value overrides any AWS Systems Manager Parameter Store value specified above.
NodeImageIdSSMParam:
Type: "AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>"
Default: /aws/service/eks/optimized-ami/1.17/amazon-linux-2/recommended/image_id
Description: AWS Systems Manager Parameter Store parameter of the AMI ID for the worker node instances. Change this value to match the version of Kubernetes you are using.
DisableIMDSv1:
Type: String
Default: "false"
AllowedValues:
- "false"
- "true"
NodeInstanceType:
Type: String
Default: t3.medium
AllowedValues:
- a1.2xlarge
- a1.4xlarge
- a1.large
- a1.medium
- a1.metal
- a1.xlarge
- c1.medium
- c1.xlarge
- c3.2xlarge
- c3.4xlarge
- c3.8xlarge
- c3.large
- c3.xlarge
- c4.2xlarge
- c4.4xlarge
- c4.8xlarge
- c4.large
- c4.xlarge
- c5.12xlarge
- c5.18xlarge
- c5.24xlarge
- c5.2xlarge
- c5.4xlarge
- c5.9xlarge
- c5.large
- c5.metal
- c5.xlarge
- c5a.12xlarge
- c5a.16xlarge
- c5a.24xlarge
- c5a.2xlarge
- c5a.4xlarge
- c5a.8xlarge
- c5a.large
- c5a.metal
- c5a.xlarge
- c5ad.12xlarge
- c5ad.16xlarge
- c5ad.24xlarge
- c5ad.2xlarge
- c5ad.4xlarge
- c5ad.8xlarge
- c5ad.large
- c5ad.metal
- c5ad.xlarge
- c5d.12xlarge
- c5d.18xlarge
- c5d.24xlarge
- c5d.2xlarge
- c5d.4xlarge
- c5d.9xlarge
- c5d.large
- c5d.metal
- c5d.xlarge
- c5n.18xlarge
- c5n.2xlarge
- c5n.4xlarge
- c5n.9xlarge
- c5n.large
- c5n.metal
- c5n.xlarge
- c6g.12xlarge
- c6g.16xlarge
- c6g.2xlarge
- c6g.4xlarge
- c6g.8xlarge
- c6g.large
- c6g.medium
- c6g.metal
- c6g.xlarge
- c6gd.12xlarge
- c6gd.16xlarge
- c6gd.2xlarge
- c6gd.4xlarge
- c6gd.8xlarge
- c6gd.large
- c6gd.medium
- c6gd.metal
- c6gd.xlarge
- c6gn.12xlarge
- c6gn.16xlarge
- c6gn.2xlarge
- c6gn.4xlarge
- c6gn.8xlarge
- c6gn.large
- c6gn.medium
- c6gn.xlarge
- cc2.8xlarge
- cr1.8xlarge
- d2.2xlarge
- d2.4xlarge
- d2.8xlarge
- d2.xlarge
- d3.2xlarge
- d3.4xlarge
- d3.8xlarge
- d3.xlarge
- d3en.12xlarge
- d3en.2xlarge
- d3en.4xlarge
- d3en.6xlarge
- d3en.8xlarge
- d3en.xlarge
- f1.16xlarge
- f1.2xlarge
- f1.4xlarge
- g2.2xlarge
- g2.8xlarge
- g3.16xlarge
- g3.4xlarge
- g3.8xlarge
- g3s.xlarge
- g4ad.16xlarge
- g4ad.4xlarge
- g4ad.8xlarge
- g4dn.12xlarge
- g4dn.16xlarge
- g4dn.2xlarge
- g4dn.4xlarge
- g4dn.8xlarge
- g4dn.metal
- g4dn.xlarge
- h1.16xlarge
- h1.2xlarge
- h1.4xlarge
- h1.8xlarge
- hs1.8xlarge
- i2.2xlarge
- i2.4xlarge
- i2.8xlarge
- i2.xlarge
- i3.16xlarge
- i3.2xlarge
- i3.4xlarge
- i3.8xlarge
- i3.large
- i3.metal
- i3.xlarge
- i3en.12xlarge
- i3en.24xlarge
- i3en.2xlarge
- i3en.3xlarge
- i3en.6xlarge
- i3en.large
- i3en.metal
- i3en.xlarge
- inf1.24xlarge
- inf1.2xlarge
- inf1.6xlarge
- inf1.xlarge
- m1.large
- m1.medium
- m1.small
- m1.xlarge
- m2.2xlarge
- m2.4xlarge
- m2.xlarge
- m3.2xlarge
- m3.large
- m3.medium
- m3.xlarge
- m4.10xlarge
- m4.16xlarge
- m4.2xlarge
- m4.4xlarge
- m4.large
- m4.xlarge
- m5.12xlarge
- m5.16xlarge
- m5.24xlarge
- m5.2xlarge
- m5.4xlarge
- m5.8xlarge
- m5.large
- m5.metal
- m5.xlarge
- m5a.12xlarge
- m5a.16xlarge
- m5a.24xlarge
- m5a.2xlarge
- m5a.4xlarge
- m5a.8xlarge
- m5a.large
- m5a.xlarge
- m5ad.12xlarge
- m5ad.16xlarge
- m5ad.24xlarge
- m5ad.2xlarge
- m5ad.4xlarge
- m5ad.8xlarge
- m5ad.large
- m5ad.xlarge
- m5d.12xlarge
- m5d.16xlarge
- m5d.24xlarge
- m5d.2xlarge
- m5d.4xlarge
- m5d.8xlarge
- m5d.large
- m5d.metal
- m5d.xlarge
- m5dn.12xlarge
- m5dn.16xlarge
- m5dn.24xlarge
- m5dn.2xlarge
- m5dn.4xlarge
- m5dn.8xlarge
- m5dn.large
- m5dn.xlarge
- m5n.12xlarge
- m5n.16xlarge
- m5n.24xlarge
- m5n.2xlarge
- m5n.4xlarge
- m5n.8xlarge
- m5n.large
- m5n.xlarge
- m5zn.12xlarge
- m5zn.2xlarge
- m5zn.3xlarge
- m5zn.6xlarge
- m5zn.large
- m5zn.metal
- m5zn.xlarge
- m6g.12xlarge
- m6g.16xlarge
- m6g.2xlarge
- m6g.4xlarge
- m6g.8xlarge
- m6g.large
- m6g.medium
- m6g.metal
- m6g.xlarge
- m6gd.12xlarge
- m6gd.16xlarge
- m6gd.2xlarge
- m6gd.4xlarge
- m6gd.8xlarge
- m6gd.large
- m6gd.medium
- m6gd.metal
- m6gd.xlarge
- mac1.metal
- p2.16xlarge
- p2.8xlarge
- p2.xlarge
- p3.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3dn.24xlarge
- p4d.24xlarge
- r3.2xlarge
- r3.4xlarge
- r3.8xlarge
- r3.large
- r3.xlarge
- r4.16xlarge
- r4.2xlarge
- r4.4xlarge
- r4.8xlarge
- r4.large
- r4.xlarge
- r5.12xlarge
- r5.16xlarge
- r5.24xlarge
- r5.2xlarge
- r5.4xlarge
- r5.8xlarge
- r5.large
- r5.metal
- r5.xlarge
- r5a.12xlarge
- r5a.16xlarge
- r5a.24xlarge
- r5a.2xlarge
- r5a.4xlarge
- r5a.8xlarge
- r5a.large
- r5a.xlarge
- r5ad.12xlarge
- r5ad.16xlarge
- r5ad.24xlarge
- r5ad.2xlarge
- r5ad.4xlarge
- r5ad.8xlarge
- r5ad.large
- r5ad.xlarge
- r5b.12xlarge
- r5b.16xlarge
- r5b.24xlarge
- r5b.2xlarge
- r5b.4xlarge
- r5b.8xlarge
- r5b.large
- r5b.metal
- r5b.xlarge
- r5d.12xlarge
- r5d.16xlarge
- r5d.24xlarge
- r5d.2xlarge
- r5d.4xlarge
- r5d.8xlarge
- r5d.large
- r5d.metal
- r5d.xlarge
- r5dn.12xlarge
- r5dn.16xlarge
- r5dn.24xlarge
- r5dn.2xlarge
- r5dn.4xlarge
- r5dn.8xlarge
- r5dn.large
- r5dn.xlarge
- r5n.12xlarge
- r5n.16xlarge
- r5n.24xlarge
- r5n.2xlarge
- r5n.4xlarge
- r5n.8xlarge
- r5n.large
- r5n.xlarge
- r6g.12xlarge
- r6g.16xlarge
- r6g.2xlarge
- r6g.4xlarge
- r6g.8xlarge
- r6g.large
- r6g.medium
- r6g.metal
- r6g.xlarge
- r6gd.12xlarge
- r6gd.16xlarge
- r6gd.2xlarge
- r6gd.4xlarge
- r6gd.8xlarge
- r6gd.large
- r6gd.medium
- r6gd.metal
- r6gd.xlarge
- t1.micro
- t2.2xlarge
- t2.large
- t2.medium
- t2.micro
- t2.nano
- t2.small
- t2.xlarge
- t3.2xlarge
- t3.large
- t3.medium
- t3.micro
- t3.nano
- t3.small
- t3.xlarge
- t3a.2xlarge
- t3a.large
- t3a.medium
- t3a.micro
- t3a.nano
- t3a.small
- t3a.xlarge
- t4g.2xlarge
- t4g.large
- t4g.medium
- t4g.micro
- t4g.nano
- t4g.small
- t4g.xlarge
- u-12tb1.metal
- u-18tb1.metal
- u-24tb1.metal
- u-6tb1.metal
- u-9tb1.metal
- x1.16xlarge
- x1.32xlarge
- x1e.16xlarge
- x1e.2xlarge
- x1e.32xlarge
- x1e.4xlarge
- x1e.8xlarge
- x1e.xlarge
- z1d.12xlarge
- z1d.2xlarge
- z1d.3xlarge
- z1d.6xlarge
- z1d.large
- z1d.metal
- z1d.xlarge
ConstraintDescription: Must be a valid EC2 instance type
Description: EC2 instance type for the node instances
NodeVolumeSize:
Type: Number
Default: 20
Description: Node volume size
Subnets:
Type: "List<AWS::EC2::Subnet::Id>"
Description: The subnets where workers can be created.
VpcId:
Type: "AWS::EC2::VPC::Id"
Description: The VPC of the worker instances
Mappings:
PartitionMap:
aws:
EC2ServicePrincipal: "ec2.amazonaws.com"
aws-us-gov:
EC2ServicePrincipal: "ec2.amazonaws.com"
aws-cn:
EC2ServicePrincipal: "ec2.amazonaws.com.cn"
aws-iso:
EC2ServicePrincipal: "ec2.c2s.ic.gov"
aws-iso-b:
EC2ServicePrincipal: "ec2.sc2s.sgov.gov"
Conditions:
HasNodeImageId: !Not
- "Fn::Equals":
- !Ref NodeImageId
- ""
IMDSv1Disabled:
"Fn::Equals":
- !Ref DisableIMDSv1
- "true"
Resources:
NodeInstanceRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- !FindInMap [PartitionMap, !Ref "AWS::Partition", EC2ServicePrincipal]
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
- !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
Path: /
NodeInstanceProfile:
Type: "AWS::IAM::InstanceProfile"
Properties:
Path: /
Roles:
- !Ref NodeInstanceRole
NodeSecurityGroup:
Type: "AWS::EC2::SecurityGroup"
Properties:
GroupDescription: Security group for all nodes in the cluster
Tags:
- Key: !Sub kubernetes.io/cluster/${ClusterName}
Value: owned
VpcId: !Ref VpcId
NodeSecurityGroupIngress:
Type: "AWS::EC2::SecurityGroupIngress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
FromPort: 0
GroupId: !Ref NodeSecurityGroup
IpProtocol: "-1"
SourceSecurityGroupId: !Ref NodeSecurityGroup
ToPort: 65535
ClusterControlPlaneSecurityGroupIngress:
Type: "AWS::EC2::SecurityGroupIngress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
FromPort: 443
GroupId: !Ref ClusterControlPlaneSecurityGroup
IpProtocol: tcp
SourceSecurityGroupId: !Ref NodeSecurityGroup
ToPort: 443
ControlPlaneEgressToNodeSecurityGroup:
Type: "AWS::EC2::SecurityGroupEgress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet and pods
DestinationSecurityGroupId: !Ref NodeSecurityGroup
FromPort: 1025
GroupId: !Ref ClusterControlPlaneSecurityGroup
IpProtocol: tcp
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: "AWS::EC2::SecurityGroupEgress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
DestinationSecurityGroupId: !Ref NodeSecurityGroup
FromPort: 443
GroupId: !Ref ClusterControlPlaneSecurityGroup
IpProtocol: tcp
ToPort: 443
NodeSecurityGroupFromControlPlaneIngress:
Type: "AWS::EC2::SecurityGroupIngress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
FromPort: 1025
GroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: "AWS::EC2::SecurityGroupIngress"
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
FromPort: 443
GroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
ToPort: 443
NodeLaunchTemplate:
Type: "AWS::EC2::LaunchTemplate"
Properties:
LaunchTemplateData:
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
DeleteOnTermination: true
VolumeSize: !Ref NodeVolumeSize
VolumeType: gp2
IamInstanceProfile:
Arn: !GetAtt NodeInstanceProfile.Arn
ImageId: !If
- HasNodeImageId
- !Ref NodeImageId
- !Ref NodeImageIdSSMParam
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroupIds:
- !Ref NodeSecurityGroup
UserData: !Base64
"Fn::Sub": |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments},
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
MetadataOptions:
HttpPutResponseHopLimit : 2
HttpEndpoint: enabled
HttpTokens: !If
- IMDSv1Disabled
- required
- optional
NodeGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
LaunchTemplate:
LaunchTemplateId: !Ref NodeLaunchTemplate
Version: !GetAtt NodeLaunchTemplate.LatestVersionNumber
MaxSize: !Ref NodeAutoScalingGroupMaxSize
MinSize: !Ref NodeAutoScalingGroupMinSize
Tags:
- Key: Name
PropagateAtLaunch: true
Value: !Sub ${ClusterName}-${NodeGroupName}-Node
- Key: !Sub kubernetes.io/cluster/${ClusterName}
PropagateAtLaunch: true
Value: owned
VPCZoneIdentifier: !Ref Subnets
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: 1
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
PauseTime: PT5M
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt NodeInstanceRole.Arn
NodeSecurityGroup:
Description: The security group for the node group
Value: !Ref NodeSecurityGroup
NodeAutoScalingGroup:
Description: The autoscaling group
Value: !Ref NodeGroup