Search code examples
amazon-ecsaws-fargatehealth-check

Fargate service stops because "ELB health check" fails


I'm brand new in the AWS world and I have an issue with my Fargate task: it is always stopped because the health check seems to encounter an issue:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:REGION:IDENTIFIER:targetgroup/TG_NAME/TG_ID)

I've read a lot of posts and made a lot of tests before posting this... and now I'm hoping I'm missing something obvious for someone more familiar with AWS.

Here is where I am:

My service (Fargate) is included in a Security group with these permissions:

TYPE         PROTOCOL  PORT_RANGE  SOURCE
--------------------------------------------
HTTP         TCP       80          0.0.0.0/0  // normally, only this one
All traffic  All       All         0.0.0.0/0  // but because I'm quite desperate
All traffic  All       All         ::/0

The associated Target Group has an health check defined like this:

Protocol: HTTP
Route: /awshealth
Port: Traffic port
...
Success codes: 200

From my logs, I know that my /awshealth route is called and answer a status 200:

enter image description here

Nevertheless my task stops after some times because of a health check issue (whereas I could request my server on the public DNS associated to my load balancer until this moment).

Does anyone could help me fix this?

Thanks in advance!

Note 1: My Load Balancer is associated to all my Availability zones (and all my subnets), share the same VPC and the same Security Groups as my Service.

Note 2: The service needs a few minutes to start and I've added a Health check grace period of 300 in my service.


Solution

  • It was a memory issue.

    The server was starting correctly (which explains my 200 statuses on my /awshealth route)... but a few minutes later the CPU was running at 100% and the server shut down, which was bringing my Service to stop.

    I've just added some memory and everything is ok now.