amazon-web-services docker amazon-ecs aws-ecs

Ensure ECS only kills old tasks when new ones are ready

We have Docker-based ECS services where once the process is up, it needs to synchronize application state before it is ready to start serving requests. This can take some time (a number of seconds after the process starts).

When using ECS Services, changing the task definition version triggers a rolling replacement of the tasks (good), but it does it too quickly. Once a task reaches a RUNNING state, the next task is killed. But RUNNING just means the process is started, it doesn't mean it's met all its own internal requirements to be ready to do work... in this case, not ready to serve requests

This entire update process happens so quickly that in some cases, all the old tasks are killed before any of the new tasks have finished loading their state, and we end up with an outage.

What is the best or correct way to ensure ECS Services doesn't terminate old/hot tasks until the new tasks are actually hot & fully online, and not simply that the container process is running?

Solution

You can control the speed at which a deployment proceeds by setting the following parameters:

deploymentConfiguration (specifically, the minimumHealthyPercent in your case)
enabling health checks (with load balancer health checks if you are using a load balancer or with container health checks)
setting healthCheckGracePeriodSeconds (for load balancer health checks) or startPeriod (for container health checks) to account for the start up synchronization time.