In the documentation of GCP AI Platform Unified it says:
AI Platform scales your nodes based on CPU usage even if you have configured your prediction nodes to use GPUs; therefore if your prediction throughput is causing high GPU usage, but not high CPU usage, your nodes might not scale as you expect
How do we scale based on GPU usage?
[1]https://cloud.google.com/ai-platform/prediction/docs/machine-types-online-prediction#specifying_gpus
[2]https://cloud.google.com/ai-platform-unified/docs/resources/release-notes