Search code examples
kubernetescloudkubernetes-podkubeflowkubeflow-pipelines

Instantiate and Shutdown Kubeflow pods


I'm learning about Kubernetes and Kubeflow, and there's something that I want to do that I'm not finding any clear answer on the internet on if it's possible or the route I should take.

When training my machine learning model, I want to use a large machine to train my models on the cloud, but after that, I only want to serve the model on a small size instance. I want the large machine to only be used when it comes to the training step and be shut down after that. Is it possible to do that with Kubeflow? And if so, how I would go about doing it?

Sorry for the newbie question, I'm still learning about this platform.


Solution

  • One way to do this is to have two separate clusters. One large cluster for training and another smaller cluster for serving. You could use Kubeflow Pipelines on the larger cluster, train the model and then place the model file in distributed storage. On the smaller cluster you could just run KFServing standalone and load the model binary from distributed storage into your Inference service.