I've setup autoscaling for a MIG based on pub-sub queue size:
gcloud compute instance-groups managed set-autoscaling my-group \
--zone=us-central1-a \
--max-num-replicas=10 \
--min-num-replicas=0 \
--update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
--stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.labels.subscription_id = my_subscription" \
--stackdriver-metric-single-instance-assignment=10
I notice that scaling down takes significant time AFTER the num_undelivered_messages reaches 0. Last time I checked it took 16 minutes since the moment when the last message has been ack
ed before the MIG finally scaled down to 0.
How do I decrease it to ~60 seconds?
If we look at the following article link, we see that scaling in (by default) doesn't occur until the signal that caused the scale out has ceased 10 minutes ago. I read this as:
If at 9:00am your signal (pub/sub) was breached then assuming the condition is no longer present immediately, scale in won't happen until at least 9:10am.
I seem to see that the scaling in looks for the signal in the last 10 minute window. Given also that you are looking a GCP monitoring metrics and these don't get updated in real time but instead have latencies, even though the trigger may no longer be present immediately, it may be 9:05am before monitoring reports all is now well. This means it might be 9:15am (9:05am + 10 minutes) before scale in occurs.
Looking at the docs again, we seem to see that we can change the scale-in policy using the --scale-in-control
flag of gcloud
. See scale-in-control flag documentation. Looking closely, it has a parameter called time-window
that is documented as:
How long back autoscaling should look when computing recommendations. The autoscaler will not resize below the maximum allowed deduction subtracted from the peak size observed in this period. Measured in seconds.
This seems to allow us to over-ride the default 10 minutes period.