I'm setting up an InferenceService using Argo and KFServing with Amazon EKS (Kubernetes). Its important to know that our team has one EKS cluster per environment, which means there can be multiple applications within our cluster that we don't control.
Here is what we have setup so far
argo
to submit workflows which start the training in #1. When installing argo into our kubernetes cluster, we notice that its components are sometimes assigned to the GPU nodes.The current setup we have for #2 and #3 (above) seems to prohibit KFServing's ability to scale down to zero. It concerns us that having these components in the GPU would not allow the GPU to scale down.
Which pods need to be assigned to our GPU nodes?
(Option 1) Do we only need our argo workflow pod to be assigned and repel the rest?
-- OR --
(Option 2) Are there other kfserving components needed within the GPU node to work right?
Option 1: How do we repel all pods from going into our GPU nodes other than the argo workflow pod? As a reminder, we have other applications we can't control, so adding node affinities for every pod seems unrealistic.
Option 2: How do the GPU nodes scale to zero when these GPU nodes have kfserving components in them? I was under the impression that scaling down means there are no pods in the node.
tl;dr You can use taints.
Which pods need to be assigned to our GPU nodes?
The pods of the jobs that require GPU.
If your training job requires GPU you need to assign it using the nodeSelector
and tolerations
in the spec of your training/deployment deployment, see a nice example here.
If your model is CV/NLP (many matrix multiplications), you might want to have the inferenceservice in the GPU as well, in that case you need to have it requested in its spec as described here.
Do we only need our argo workflow pod to be assigned and repel the rest?
Yes, if your inferenceservice does not require GPU.
Are there other kfserving components needed within the GPU node to work right?
No, the only kfserving component is the kfserving-controller
and does not require a gpu as it's only orchestrating the creation of the istio&knative resources for your inferenceservice.
If there are inferenceservices running in your gpu nodegroup without having the GPU requested in the spec, it means that the nodegroup is not configured to have the taint effect NoSchedule
. Make sure that the gpu nodegroup in the eksctl configuration has the taint as described in the doc.