Search code examples
amazon-web-servicesdockeramazon-ecs

AWS ECS: How to run hundreds of containers with different code


I've been studying AWS ECS, but I cannot understand how to architect the following scenario:

I'm trying to run hundreds of algorithms at once in real-time:

  1. Each algorithm needs to run in a single isolated container
  2. Algorithms share the same common code / environment setup and only one file is different in each algorithm (let's call it the "algo-file") which contains some custom code that differs from all other algorithms
  3. Algorithms are long running - they should be perpetually live until they get a termination signal from a main instance that I have built
  4. If an algorithm's container fails, the ECS should replace it with a healthy container
  5. No two "algo-files" are same across algorithms, even though all other files are the same
  6. I keep track of which "algo-files" have been deployed (and which have not) in a central database

Each container should also be able to interact with a common database and call external APIs and receive requests from internal EC2 instances.

(EDITED) Any suggestions on how this may be architected?


Solution

  • You could probably do a bunch of ECS run-task commands, with different environment variables in each command, specifying which algo file to use for each task. And code/configure your docker containers in such a way that they keep running until a stop command is sent to each one. There would be no ECS service here, each task would just be a separate instance. In this scenario, you don't get the "If an algorithm's container fails, the ECS should replace it with a healthy container" feature.

    If you wanted to run all these as a single ECS service, it would be more complicated. The ECS service is going to run N identical tasks. You would have to have some sort of external orchestration thing, maybe as simple as a DynamoDB table, that each task in the service connects to at startup and picks an algorithm to use, that hasn't already been chosen by any other task. That's just a very high-level suggestion to give you an idea of how you would have to build some of this yourself, since an ECS Service does not support what you are trying to do directly out of the box.