Search code examples
amazon-web-serviceskubernetesterraformamazon-ekslaunch-template

EKS -nodes fail when launched through a launch template (terraform)


when i launch the node normally, everything working fine, but when i try to launch it using a launch template, im having connection issues within the cluster.

more specifically, aws-node pod fails with the error:

{"level":"info","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}

digging through the other posts here, many people seem to point to iam role issues, but my iam role is fine, and besides ive been using the same role to launch many other nodes and they launched succesfully.

here are my terraform files:

resource "aws_eks_node_group" "eth-staking-nodes" {
  cluster_name    = aws_eks_cluster.staking.name
  node_group_name = "ethstaking-nodes-testnet"
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [    data.aws_subnet.private-1.id,
    data.aws_subnet.private-2.id
  ]

  scaling_config {
    desired_size = 1
    max_size     = 5
    min_size     = 0
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    role = "general"
  }

  launch_template {
    version = aws_launch_template.staking.latest_version
    id      = aws_launch_template.staking.id
  }

  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}

the launch template:

esource "aws_launch_template" "staking" {
  name          = "${var.stage}-staking-node-launch-template"
  instance_type = "m5.2xlarge"
  image_id      = "ami-08712c7468e314435"

  key_name = "nivpem"
  
  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size = 450
      volume_type = "gp2"
    }
  }

  lifecycle {
    create_before_destroy = false
  }

  vpc_security_group_ids = [aws_security_group.eks-ec2-sg.id]
  user_data = base64encode(templatefile("${path.module}/staking_userdata.sh", {
        password = "********"
      }))

  tags = {
    "eks:cluster-name"   = aws_eks_cluster.staking.name
    "eks:nodegroup-name" = "ethstaking-nodes-testnet"
  }

  tag_specifications {
    resource_type = "instance"

    tags = {
      Name                 = "${var.stage}-staking-node"
      "eks:cluster-name"   = aws_eks_cluster.staking.name
      "eks:nodegroup-name" = "ethstaking-nodes-testnet"
    }
  }
}

security group:

resource "aws_security_group" "eks-ec2-sg" {
  name        = "eks-ec2-sg-staking-testnet"
  vpc_id      = data.aws_vpc.vpc.id

  ingress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name = "allow_tls"
  }
}

Solution

  • Consider adding vpc_config with vpc_config and endpoint_public_access set to true in your aws_eks_cluster resource. That should make it work since you're using private subnets.