On AWS, I created an auto-scaling group with an automated scaling policy that adds a new instance based on an Application Load Balancer: Average Request Count Per Target above 5.
The target group is the number of HTTP requests sent to the Load Balancer.
The ASG is set to min 1, max 10 and desired 1.
I tried to send 200 requests to the ELB and record the IP of the instance that receives the request in a database. I found that most of the requests were sent to the same instance and some of them receive (Gateway Timeout 504) and few of them receive nothing.
The ASG launches new instances but after requests are already sent. So, the new instances receive nothing from the load balancer.
I think the reason is that cloud watch sends the average number of requests per instance every > 1 minute and perhaps opening a new instance happens in a longer time than the timeout of the request.
Q: Is there a method to keep the requests in a queue or increase their timeout till the new instances exist and then distribute these requests on all instances instead of losing them? Q: If the user sends many requests at the same time, I want the ASG to start scaling immediately and these requests are distributed uniformly on the instances keeping a specific average number of requests per instance.
The solution was using Amazon Simple Queue Service. We forwarded the messages from the API Gateway to the queue. Then, a cloud watch alarm was used to open ECS fargate tasks when the queue size > 1 to read messages from the queue and process them. When the queue is empty, another alarm was used to set the # of tasks in the ECS service to 0.