tensorflow google-cloud-platform google-ai-platform

GCP: IA ML serving with autoscaling to zero

I wanted to try the ML serving AI platform from GCP, but i want the node to scale only if there is a call to prediction.

I see in the documentation here:

If you select "Auto scaling", the optional Minimum number of nodes field displays. You can enter the minimum number of nodes to keep running at all times, when the service has scaled down. This field defaults to 0.

But when i try to create my model version, it shows an error telling me that this field should be > 1.

Here is what i tried:

Name: testv1
Pre-Built Container
Python 3.7
Framework Tensorflow
TF version 2.4.0
ML 2.4
Scaling auto-scaling
Min nodes nb 0
machine type n1-standard-4
GPU TESLA_K80 * 1

Solution

I tried to reproduce your case and found the same thing, I was not able to set the Minimum number of nodes to 0.

This seems to be an outdated documentation issue. There is an ongoing Feature Request that explains it was possible to set a minimum of 0 machines with a legacy machine type, and requests to make this option available for current types too.

On the other hand, I went ahead and opened a ticket to update the documentation.

As a workaround, you can deploy and your models right when you need them and then proceed to un-deploy them. Be mindful that undeployments may take up to 45 minutes, so it is advisable to wait 1 hour to re-deploy that model to avoid any issues.