Search code examples
amazon-web-servicesamazon-ec2amazon-ecsaws-fargate

How do health checks actually work in Amazon ECS?


I'm incredibly confused about how health checks work for a docker container running in ECS using AWS Fargate. I think what makes this confusing is that there's three core components working in tandem, each of which I've seen have its own "health check" concerns:

  • ECS
  • EC2
  • ALB

First, if I check the health check docs, it makes it very clear that the built-in HEALTHCHECK in my docker image won't be used. However, I've seen comments from others on SO that they are used, so which is it?

Concerning the health check setup for ECS, I'm not seeing any way to configure health check commands when I create a Task Definition for my ECS service via Fargate in the AWS dashboard (web interface). I'm setting up the infrastructure using the CDK in C#, but for learning purposes I look at the AWS dashboard and edit things from there. I figure I need to learn how to set things up manually before I try to automate it.

I'll mention what I do see, but I'm not sure how it all pieces together.

  • ECS -> Clusters -> Click cluster name -> Click service name: I see "Healthy Targets" and "Unhealthy Targets"

  • ECS -> Clusters -> Click cluster name -> Click service name -> Deployments and events tab: There's a log that says "service X port 80 is unhealthy in target-group Y due to (reason Health checks failed with these codes: [404]). If I click the link for Y, it takes me to "EC2 -> Target groups -> Y (fargate)", which has a "Health checks" tab. There, I can click "Edit" and specify the health check "Path". This seems to eliminate the error.

  • ECS -> Task definitions -> Click task def name -> Click revision name -> JSON tab: No mention of "health" anywhere in this file

From the CDK, it looks like you can set up health checks after creating ApplicationLoadBalancedFargateService, at which point you can invoke ApplicationLoadBalancedFargateService.TargetGroup.ConfigureHealthCheck(), which takes an IHealthCheck that I haven't figured out how to create yet.

Also in the CDK there is QueueProcessingFargateService (not sure how that's different from the ALB version of FargateService) that has a HealthCheck property I can initialize, whereas the ALB version does not. Just adds more confusion. I don't necessarily care about QueueProcessingFargateService itself, but it does show up in the code example for HealthCheck in the CDK docs

All of this is very confusing. The AWS web UI is absolutely horrid and difficult to navigate. I'm seeing a lot of conflicting information on SO and google search results in general about how to set up health checks. Can someone please help make sense of all of this?


Solution

  • Concerning the health check setup for ECS, I'm not seeing any way to configure health check commands when I create a Task Definition for my ECS service via Fargate in the AWS dashboard

    You would have to do that by editing the Task Definition JSON manually, instead of using the point-and-click features of the ECS web console. The ECS web console is currently missing a lot of features.

    All of this is very confusing. The AWS web UI is absolutely horrid and difficult to navigate. I'm seeing a lot of conflicting information on SO and web search results in general about how to set up health checks. What can I try next?

    I recommend not using the web UI at all. Use the CDK, or use Terraform for creation of resources. Use the web UI for just looking at what was created.

    As for exactly how to setup health checks, it depends on what you are trying to do. If you are using a load balancer, then the target group health checks are required. You set those up on the Load Balancer's Target Group, and you could do that through the UI since that is over in the EC2 web UI and is fully featured. Target Group health checks will perform a network request to the ECS task periodically, and ensure that is is receiving a proper response.

    If you are not using a load balancer, or if you just want extra health checks in addition to the Target Group checks, you can setup health check commands in the ECS task definition. These run a command inside the container periodically. You can't really setup these via the web UI and even the higher level CDK constructs probably mask this or make it less than obvious. This is an optional, and advanced feature of ECS that most people don't use, and I believe you would have to drop down to lower-level CDK constructs if you were using the CDK.