I have a VMSS with instance count say 3.Lets say I specified that if CPU utilization is <20%, then reduce instance by 3 to 1. Assume that these 3 instances were serving some request and let's say each request take 60 seconds to complete.
Assume at this moment CPU utilization reached 15%, so instance count should reduce by 2. So at this moment what will happen with the existing request which was serving by other two instances. Do these instances shift their ongoing process to other instance or it would not reduce the count until they complete the ongoing request?
I already have attached the scale set with Application Gateway and enabled the connection draining so that ongoing process should not drop. But it is dropping. As it fails I am trying to do something using API management Revision & Version.
Expectation: Once scale down/scale in happens, ongoing requests should not drop.
The scale set has no understanding of what is going on in your VM and what requests are ongoing. When you reach the threshold for scale down then your VM will be removed and any existing requests will fail.
You should be using a load balancer in front of your scale set to ensure that traffic is no longer sent to the VMs being shut down. Your application needs to be built to retry requests if they fail due to scale down.