tensorflow kubernetes scale tensorflow-serving

Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running?

I have hundreds of models, based on categories, projects,s, etc. Some of the models are heavily used while other models are not used very frequently. How can I trigger a scale-up operation only in case needed (For the models that are not frequently used), instead of running hundreds of pods serving hundreds of models while most of them are not being used - which is a huge waste of computing resources.

Solution

What you are trying to do is to scale deployment to zero when these are not used.

K8s does not provide such functionality out of the box.

You can achieve it using Knative Pod Autoscaler. Knative is probably the most mature solution available at the moment of writing this answer.

There are also some more experimental solutions like osiris or zero-pod-autoscaler you may find interesting and that may be a good fit for your usecase.

Could not find a version that satisfies the requirement tensorflow
How do I use distributed DNN training in TensorFlow?
Loading tf.keras model, ValueError: The two structures don't have the same nested structure
Tensorflow is unable to train to predict simple multiplication
Why does tensorflow loss go to infinity with larger training set?
how to get string value out of tf.tensor which dtype is string
How to predict list elements outside the bounds of a py dataframe?
Load model from model.weights.h5 file stored in Azure Blob
the number and the name of the event files in tensorflow?
Meaning of sparse in "sparse cross entropy loss"?
Error --accelerator unrecognized argument when launching gcloud beta ai-platform versions create API
Change the threshold value of the keras RELU activation function
How to implement tf.gather_nd in Pytorch with the argument batch_dims?
Pipenv fails locking when installing TensorFlow 2.4.1
Tensorflow dataset splitted sizing parameter problem: Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Difference between "compute capability" "cuda architecture" clarification for using Tensorflow v2.3.0
problem with importing @tensorflow/tfjs-node while working with face-api.js package (node.js)
How to optimize multiple loss functions separately in Keras?
InvalidArgumentError in Keras custom loss function
Can we use multiple loss functions in same layer?
Running into issue: cannot import name '__version__' from 'tensorflow.python.keras'
How Can I Use GPU to Accelerate Image Augmentation?
ValueError: Shapes (None, 1) and (None, 3) are incompatible
Ordering of batch normalization and dropout?
Tensorflow - Nodegyp failed to build even if it is installed
autoencoder.fit() raises 'KeyError: 'Exception encountered when calling Functional.call()'
Error when loading old .h5 file with latest Keras
Need to convert Keras model to TensorFlow.js but facing version compatibility issues between TensorFlow and Keras
What is freezing/unfreezing a layer in neural networks?
How can I use a pre-trained neural network with grayscale images?