Search code examples
jupyter-notebooknvidiarancherk3skubeflow

Unable to run notebooks on GPU (on-premises), running kubeflow on ranchers k3s cluster


I have installed kubeflow on k3s cluster. I have configured the cluster to be able to access GPUs and it's working fine for test pods on gpu, and I can see nvidia-smi output on pods.

But, I want notebooks created from kubeflow dashboard to run on gpus, I am not able to find anything how I can do this. I have selected gpus from UI in notebook creation on dashboard but still while running jupyter notebook following command returns false:

torch.cuda.is_avaiable() return False

For k3s, we need to add "runtimeClassName: nvidia" in pod specs but there is not way I can do this in kubeflow manifests. Please suggest me something or guide me if I am missing something.

Thanks in advance.


Solution

  • i solved this issue by setting nvidia to be the default runtime. run it in your local machine with gpu:

    sudo nvidia-ctk runtime configure --runtime=containerd --nvidia-set-as-default

    sudo systemctl restart containerd

    sudo systemctl status containerd