In AWS, I have an ECS cluster that contains a service that has 2 EC2 instances. I sent 3 separate API requests to this service, each should take about an hour to run at 100% capacity. I sent the requests a couple minutes apart. They all went to the same instance and left the other open. Here's a graph of CPU utilization Here's an image of my Service CPU Utilization. It is not using all it's bandwidth: What am I missing? Why won't requests go to the second EC2 instance
An ALB
will not perfectly Round-Robin between two instances. If you sent 100
requests, 100
times, then on average each instance would receive 50
requests each, but most of the time it won't be 50
exactly for each backend.
For a long running task like this it is preferable to use something else such as SQS
, whereby each container will only process x
messages at a time (most of the time you'd want x=1
). Each instance can then poll SQS
for the work, and wont take more work whilst it is busy.
You will receive other benefits too such as being able to see how long a message is taking to finish, and error handling capabilities to account for timeouts or if a server were to die whilst it is doing work.