Search code examples
amazon-web-servicesamazon-ecs

AWS ECS unable to run more than 10 number of tasks


I have an ECS Cluster with say 20 registered instances.

I have 3 task definitions to solve a big data problem.

Task 1: Split Task - This starts a docker container and the container definition has an entrypoint to run a script called HPC-Split. This script splits the big data into say 5 parts in a mounted EFS. The number of tasks (count) for this task is 1.

Task 2: Run Task: This starts another docker container and this docker container has an entrypoint to run a script called HPC-script which processes each split part. The number of tasks selected for this is 5, so that this is processed in parallel.

Task 3: Merge Task: This starts a third docker container which has an entrypoint to run a script called HPC-Merge and this merges the different outputs from all the parts. Again, the number of tasks (count) that we need to run for this is 1.

Now AWS service limits say: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service_limits.html The maximum tasks (count) we can run is 10. So we are at the moment able to run only 10 processes in parallel. Meaning, Split the file (1 task runs on one instance), Run the process (task runs on 10 instances), Merge the file (task runs on 1 instance.)

The limit of 10 is limits the level at which we can parallelize our processing and I don't know how to get around. I am surprised about this limit because there is surely a need to run long running processes on more than 10 instances in the cluster.

Can you guys please give me some pointers on how to get around this limit or how to use ECS optimally to run say 20 number of tasks parallely. The spread placement I use is 'One task per host' because the process uses all cores in one host.

How can I architect this better with ECS?


Solution

  • Number of tasks launched (count) per run-task

    This is the maximum number of tasks that can be launched per invocation of the run-task API. To launch more tasks, call the run-task API again.