Search code examples
mysqljdbcapache-sparkapache-zeppelin

Trying to load a jar and an external class


In my zeppelin-env.sh I am loading a JDBC MySQL connector as jar as follows

export ZEPPELIN_JAVA_OPTS+=" -Dspark.jars=/usr/local/opt/mysql-connector-java/libexec/mysql-connector-java-5.1.32-bin.jar"

In addition, I'd like to load the Databricks CSV package which is supposed to work in 2 (or more) ways:

  1. %dep z.load("com.databricks:spark-csv_2.10:1.2.0")
  2. export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"

The first is working when no SPARK_HOME is set, SPARK_SUBMIT_OPTIONS however is only taken into account when an external Spark home is set.

How can I load the databricks CSV package without setting SPARK_HOME, or, how can I load all the other jars that get included when using the embedded Spark libraries, without setting SPARK_HOME?

I'd actually prefer to use a separate Spark installation that I can update independently of Zeppelin, however I fear incompatibilities that I don't have when sticking to the embedded Spark.


Solution

  • So I did set SPARK_HOME using an external Spark install which seems faster and was incredibly easy to install w/ brew install apache-spark.

    Reading the documentation would have helped I guess

    Simply add a --jars option the SPARK_SUBMIT_OPTIONS where you specify the JAR to be loaded. Alternatively, create a SPARK_HOME/conf/spark-defaults.conf file where you specify the files, packages and jars to be loaded.