Search code examples
hadoopapache-sparkspark-streaminghadoop-yarnapache-zeppelin

Loading external dependencies in Zeppelin on Spark in Yarn-Client mode on Ubuntu 14.04


Dear community! Before I describe the problem, here's a short description of the software in use (where the latter two are running in a small cluster of three nodes, each of them using Ubuntu 14.04):

  • Zeppelin 0.6.1
  • Spark 2.0.0 along with Scala 2.11.8
  • Hadoop 2.7.3

The situation is as follows: In order to use the TwitterUtils class in a Spark Streaming application written in a Zeppelin note, I need to include org.apache.spark.streaming.twitter._ from Maven (org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview). What I learned so far is that there are a couple of options to make external dependencies available in Zeppelin:

  • Export the SPARK_SUBMIT_OPTIONS variable in conf/zeppelin-env.sh and set --jars (in my case --jars hdfs://admdsmaster:54310/global/jars/spark-streaming-twitter_2.11-2.0.0-preview.jar (path pointing to local file system was tested as well)).
  • Export SPARK_SUBMIT_OPTIONS and set --packages (in my case --packages org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview).
  • Set spark.jars or spark.jars.packages in conf/spark-defaults.conf with the values mentioned above.
  • Use the %dep interpreter in Zeppelin itself like so: z.load("org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview"). This is deprecated, though.
  • Use sc.addJar() in the Zeppelin note to manually add a .jar file.

After having tried all of the above -- and almost arbitrary combinations and variations thereof -- the problem is that I still can't import the TwitterUtils class from within a Zeppelin note:

Class import failing in Zeppelin note.

What can be seen from the picture as well is the output of sc.listJars() which shows that the .jar file was actually included. Nonetheless, the class import fails.

My first thought was that the problem occurs because Spark is running in yarn-client mode, so I started the Spark shell in yarn-client mode as well and tried to import the TwitterUtils class from there -- which worked:

Class import working from Spark shell.

In order to find out what's going on, I searched the log files of Zeppelin, Spark and YARN, but couldn't find any error messages to point me to the cause of the problem.

Long story short: Although the jar file was included in Zeppelin (as proven by sc.listJars()) and although the class import works from the spark-shell in yarn-client mode, I just can't get the import to work from within my Zeppelin note.

Long story even shorter: I'd really appreciate your ideas on how to solve this problem!

Thanks in advance for your time and effort.

P.S.: I'm sorry for the fact that I could not upload the images to this post directly -- it says that I need at least 10 reputation points which I do not have as this is my first ever post here.


Solution

  • Adding the dependency from the interpreter tab as proposed by @eliasah actually did the trick -- thank you very much!

    For the fellows out there who might be running into the same problem, I'm going to describe the solution very shortly and add a picture of how a call to sc.listJars() should actually look like (compared to the picture in the original question).

    Head over to Zeppelin's interpreter tab and scroll down or search for the spark interpreter, then hit edit. At the very bottom of the available settings there is a Dependencies section. Add your dependency here (by specifying the Maven coordinates, for example, in my case org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview) and save the settings. After restarting your interpreter, the dependency should be available.

    Here's what a call to sc.listJars() looked like in my case after having executed the steps described above:

    Call to sc.listJars().

    If you compare this picture to the first one in the original question, you'll notice that the list now contains a lot more entries. I'm still wondering, though, why the class import did not work when only the .jar file containing it was present. Anyway, problem solved thanks to @eliasah -- thanks again, you deserve a cookie! -- and I hope that this short description will help others as well.