amazon-web-services terraform amazon-ecs

Capacity provider instances not being added to cluster

I'm new to AWS and I'm trying to provision an ECS cluster with a capacity provider via Terraform. My plan executes without errors currently, and I can see that the capacity provider creates my instances, but those instances are not being registered with the cluster, even though the provider can be seen in the cluster's edit page in the web console.

Here is my config for the cluster:

resource "aws_ecs_cluster" "cluster" {
  name = "main"

  depends_on = [
    null_resource.iam_wait
  ]
}

data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-ecs-hvm-*-x86_64-ebs"]
  }
}

resource "aws_launch_configuration" "cluster" {
  name = "cluster-${aws_ecs_cluster.cluster.name}"
  image_id = data.aws_ami.amazon_linux_2.image_id
  instance_type = "t2.small"

  security_groups = [module.vpc.default_security_group_id]
  iam_instance_profile = aws_iam_instance_profile.cluster.name
}

resource "aws_autoscaling_group" "cluster" {
  name = aws_ecs_cluster.cluster.name
  launch_configuration = aws_launch_configuration.cluster.name
  vpc_zone_identifier = module.vpc.private_subnets

  min_size = 3
  max_size = 3
  desired_capacity = 3

  tag {
    key = "ClusterName"
    value = aws_ecs_cluster.cluster.name
    propagate_at_launch = true
  }

  tag {
    key = "AmazonECSManaged"
    value = ""
    propagate_at_launch = true
  }
}

resource "aws_ecs_capacity_provider" "cluster" {
  name = aws_ecs_cluster.cluster.name

  auto_scaling_group_provider {
    auto_scaling_group_arn = aws_autoscaling_group.cluster.arn

    managed_scaling {
      status = "ENABLED"
      maximum_scaling_step_size = 1
      minimum_scaling_step_size = 1
      target_capacity = 3
    }
  }
}

resource "aws_ecs_cluster_capacity_providers" "cluster" {
  cluster_name = aws_ecs_cluster.cluster.name

  capacity_providers = [aws_ecs_capacity_provider.cluster.name]

  default_capacity_provider_strategy {
    base = 1
    weight = 100
    capacity_provider = aws_ecs_capacity_provider.cluster.name
  }
}

The instance profile role has this policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeTags",
        "ecs:CreateCluster",
        "ecs:DeregisterContainerInstance",
        "ecs:DiscoverPollEndpoint",
        "ecs:Poll",
        "ecs:RegisterContainerInstance",
        "ecs:StartTelemetrySession",
        "ecs:Submit*",
        "ecr:GetAuthorizationToken",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:BatchCheckLayerAvailability",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

I've read that this can happen if the instances do not have the proper roles, but as far as I can tell I've set up my roles correctly. I'm not getting any visible permission errors that I can find.

Another strange thing I've seen is that if another cluster named "default" exists, then the instances will register themselves to that cluster, even though the capacity provider is still attached to the other cluster.

Solution

Figured it out! I just had to set user_data like below in my launch configuration.

resource "aws_launch_configuration" "cluster" {
  name = "cluster-${aws_ecs_cluster.cluster.name}"
  image_id = data.aws_ami.amazon_linux_2.image_id
  instance_type = "t2.small"

  security_groups = [module.vpc.default_security_group_id]
  iam_instance_profile = aws_iam_instance_profile.cluster.name

  user_data = "#!/bin/bash\necho ECS_CLUSTER=${aws_ecs_cluster.cluster.name} >> /etc/ecs/ecs.config"
}