Search code examples
apache-sparkspark-streaminghadoop2apache-zeppelin

Use Apache Zeppelin with existing Spark Cluster


I want to install Zeppelin to use my existing Spark cluster. I used the following way:

  • Spark Master (Spark 1.5.0 for Hadoop 2.4):
    • Zeppelin 0.5.5
  • Spark Slave

I downladed the Zeppelin v0.5.5 and installed it via:

mvn clean package -Pspark-1.5 -Dspark.version=1.5.0 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests

I saw, that the local[*] master setting works also without my Spark Cluster (notebook also runnable when shutted down the Spark cluster).

My problem: When I want to use my Spark Cluster for a Streaming application, it seems not to work correctly. My SQL-Table is empty when I use spark://my_server:7077 as master - in local mode everything works fine!

See also my other question which describes the problem: Apache Zeppelin & Spark Streaming: Twitter Example only works local

Did I something wrong

  • on installation via "mvn clean packge"?
  • on setting the master url?
  • Spark and/or Hadoop version (any limitations???)
  • Do I have to set something special in zeppelin-env.sh file (is actually back on defaults)???

Solution

  • The problem was caused by a missing library dependency! So before searching around too long, first check the dependencies, whether one is missing!

    %dep
    z.reset
    z.load("org.apache.spark:spark-streaming-twitter_2.10:1.5.1")