Search code examples
apache-sparkkubernetesjupyter-notebookmlopsmlrun

Issue with MLRun Spark service start and impact all Jupyter notebooks


I reconfigured Spark infrastructure in K8s (as part of MLRun/Iguazio platform) and after that, I got a lot of issues in level of services:

  • Spark service (with information Failed)
  • All jupyter notebooks (with information Failed dependencies)

and also general error/message:

Some services have not been successfully deployed. Check the services status as shown below.

See the print screen enter image description here

I changed only amount of RAM (1-30 GB RAM), vCPU (1-14) and Replicas (3).

Did you get the similar issue and how to avoid the situation?


Solution

  • It was human mistake, the solution was easy and the key problem was in Spark service configuration (I configured extremely small vCPU values and it generated timeouts for Spark service):

    • I used setting vCPU in the range 1-14 but I used default units millicpu (not cpu)
    • After setup correct units cpu and restart of Spark service, everything was fine.

    Wrong setting enter image description here

    Correct setting enter image description here