Search code examples
apache-sparkkubernetesapache-zeppelin

Spark on kubernetes with zeppelin


I am following this guide to run up a zeppelin container in a local kubernetes cluster set up using minikube.

https://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/quickstart/kubernetes.html

I am able to set up zeppelin and run some sample code there. I have downloaded spark 2.4.5 & 2.4.0 source code and built it for kubernetes support with the following command:

./build/mvn -Pkubernetes -DskipTests clean package

Once spark is built I created a docker container as explained in the article:

bin/docker-image-tool.sh -m -t 2.4.X build

I configured zeppelin to use the spark image which was built with kubernetes support. The article above explains that the spark interpreter will auto configure spark on kubernetes to run in client mode and run the job.

But whenever I try to run any parahgraph with spark I receive the following error

Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: ?/.ivy2/local

I tried setting the spark configuration spark.jars.ivy in zeppelin to point to a temp directory but that does not work either.

I found a similar issue here: basedir must be absolute: ?/.ivy2/local

But I can't seem to configure spark to run with the spark.jars.ivy /tmp/.ivy config. I tried building spark with the spark-defaults.conf when building spark but that does not seems to be working either.

Quite stumped at this problem and how to solve it any guidance would be appreciated.

Thanks!


Solution

  • I have also run into this problem, but a work-around I used for setting spark.jars.ivy=/tmp/.ivy is to rather set it is as an environment variable.

    In your spark interpreter settings, add the following property: SPARK_SUBMIT_OPTIONS and set its value to --conf spark.jars.ivy=/tmp/.ivy.

    This should pass additional options to spark submit and your job should continue.