amazon-web-services amazon-ec2 amazon-ecs

Tasks on my ECS Container are stuck on provisioning status

I deploy my application on AWS ECS using EC2 instances. It's being managed through Terraform and I have not faced such issue before.

Recently I made some changes on my backend which introduced a bug which could cause a timeout. Because of that, my backend crushed.

The event messages shown on the deployment tab of the service are as following:

service backend instance i-000a000b0b0000c port 8000 is unhealthy in target-group backend-prod-backend L due to (reason Request timed out)

service backend has stopped 1 running tasks: task a000000f000f00000000ffef00a0f0af.

service backend deregistered 1 targets in target-group backend-prod-backend

(service backend, taskSet ecs-svc/0000000000000000000) has begun draining connections on 1 tasks.

service backend deregistered 1 targets in target-group backend-prod-backend

service backend has started 1 tasks: task a111111f111f111231241ffef00a0f0af.

As it can be seen, the task was stopped and a new task was started. However, the new task was stuck with a Provisioning status, making it pending.

I tried restarting the EC2 instances. I even deleted the whole ECS cluster and all the instances and rerun Terraform so it could redo everything, but it still stays on the same status.

I have the same version on another environment with the same configuration, but it was able to restart.

I know this might too specific, so I'm not hoping for an answer that solves the issue, but mostly for suggestion on how to deal with such case. How can I debug it? Is there a better way I can make it restart?

Solution

Answering my question, as I was able to resolve this. The Container Instance Agent Version had automatically upgraded, requiring more resources (Memory/CPU) to start. I had to release some of the resources allocated to the ECS Task in order to allow the agent start the task.