Search code examples
amazon-web-servicesdockeramazon-ec2amazon-ecs

Amazon Elastic Container Service task starts with Fargate but does not start with EC2 capacity provider


I was previously starting my ECS task on Fargate and it was running fine. Task definition had Network mode = awsvpc, cluster was not associated with any capacity provider.

Now I'm trying to use EC2 "Launch type" (network mode is still awsvpc, and target group type is IP),

  1. I created autoscaling group with launch configuration, using ami-0da25582fb45be38c (amzn2-ami-ecs-hvm-2.0.20220822-x86_64-ebs) and specific vpcID / security group / subnets

  2. I created capacity provider in my ECS cluster, and associated it with autoscaling group that I created in step 1

  3. I re-created ECS service and specified capacity provider that I created in step 2, as "Custom capacity provider strategy", with weight=100, base=1, and also specified vpcID / security group / subnets that I used in step 1

  4. Now I set min=0, desired=1, max=1 in autoscaling group. I see that one EC2 instance successfully spins up and runs. I can SSH into it using PEM certificate and when I run docker ps -a, I can see that amazon/amazon-ecs-agent:latest container continuously starts, immediately exits, starts again after 15 seconds, exits, and so on. Not sure if this is expected

  5. And finally now I set min=0, desired=1, max=1 in my ECS service. I can see one task in the task list, but its state is stuck in PROVISIONING and doesn't change. Correspondingly, no EC2 instance is allocated to it

Seems that ecs-agent is constantly restarting and that's why ECS service cannot use the EC2 instance for the task. Anyone has a clue why ecs-agent restarts?

UPD. Checked the docker logs for ecs-agent:

$ docker logs -f ecs-agent
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Starting Amazon ECS Agent" commit="a1a5ecbc" version="1.62.2"
level=info time=2022-09-06T05:20:25Z msg="Loading configuration"
level=warn time=2022-09-06T05:20:25Z msg="Unable to fetch user data: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n" module=config.go
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-agent:latest"
level=info time=2022-09-06T05:20:25Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2022-09-06T05:20:25Z msg="Loading state!" module=state_manager.go
level=info time=2022-09-06T05:20:25Z msg="eni watcher has been initialized" module=watcher_linux.go
level=info time=2022-09-06T05:20:25Z msg="Registering Instance with ECS"
level=info time=2022-09-06T05:20:25Z msg="Remaining mem: 956" module=client.go
level=error time=2022-09-06T05:20:25Z msg="Unable to register as a container instance with ECS: ClientException: The referenced cluster was inactive." module=client.go
level=info time=2022-09-06T05:20:25Z msg="Remaining mem: 956" module=client.go
level=error time=2022-09-06T05:20:25Z msg="Unable to register as a container instance with ECS: ClientException: The referenced cluster was inactive." module=client.go
level=error time=2022-09-06T05:20:25Z msg="Error registering container instance" error="ClientException: The referenced cluster was inactive."

The role that I'm using for autoscaling group has the following ECS policies:

        "ecs:CreateCluster",
        "ecs:ListClusters",
        "ecs:DeregisterContainerInstance",
        "ecs:DiscoverPollEndpoint",
        "ecs:Poll",
        "ecs:RegisterContainerInstance",
        "ecs:StartTelemetrySession",
        "ecs:UpdateContainerInstancesState",
        "ecs:Submit*",

Maybe the problem is in AMI and I can use another one? Although ecs-init version is 1.62.2 so it's up-to-date


Solution

  • Seems like the ECS cluster that the agent is trying to attach to doesn't exist. Check the value of ECS_CLUSTER inside the file /etc/ecs/ecs.config. If it's not set, a value default is getting picked up and most probably you don't have a cluster named default in your ECS.

    Edit: You can either set it to the proper value in the AMI and spin up new instances in the cluster with the new AMI or you can update the user data in the launch template to set the value when new instances are coming up.

    More about the ECS container agent configurations here: https://github.com/aws/amazon-ecs-agent/blob/master/README.md