Search code examples
amazon-web-servicesamazon-ecsamazon-elb

Make ECS wait for load balancer health checks before draining old task?


I have a service in ECS. When I deploy a new version of the service's task definition, ECS adds the new task, registers it with the target-group, unregisters the old task from the target-group, and then stops the old task.

The problem is, the new task has not had enough successful load balancer health checks to be considered healthy, so now there are no healthy targets in the target-group. 1-2 minutes later, the new task will be considered healthy by the load balancer.

To summarize:

  1. Task1 is running
  2. Push updated task definition
  3. ECS begins new deployment of service with updated task definition, creating Task2
  4. Task2 is started in Docker
  5. ECS reports that Task2 is RUNNING (because the Docker container has started)
  6. ECS registers Task2 in the target-group
  7. ECS unregisters Task1 from the target-group
  8. ECS stops Task1
  9. !!! There are no healthy targets in the target-group !!!
  10. (several minutes later) Task2 passes enough consecutive ALB health checks to be considered healthy, traffic flows as expected again

How can I avoid briefly not having any healthy targets in the target-group?


Solution

  • Please look at the available ECS Service parameters.

    Specifically, minimumHealthyPercent The default settings for minimumHealthyPercent should be preventing the behavior you describe, so I'm wondering if you have modified the default value. To prevent the behavior you are seeing I would set minimumHealthyPercent to 100.

    You should also look at the Target Group's deregistration_delay setting. You may be seeing the current behavior if you changed this setting to a low value. Or you may need to set it higher than the default value of 300 seconds.