Search code examples
apache-sparkapache-zeppelin

zeppelin notebook "error: not found: value %"


according to reading csv in zeppelin I should be using %dep to load the csv jar, but i get error: not found: value % anyone knows what i'm missing?

%spark

val a = 1

%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")

a: Int = 1
<console>:28: error: not found: value %
              %dep
              ^

in zeppelin logs i see:

 INFO [2016-04-21 11:44:19,300] ({pool-2-thread-11} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228259278 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
 INFO [2016-04-21 11:44:19,678] ({pool-2-thread-4} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228259678 started by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
 INFO [2016-04-21 11:44:19,704] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228259678 finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611
 INFO [2016-04-21 11:44:36,968] ({pool-2-thread-12} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228276968 started by scheduler 1367682354
 INFO [2016-04-21 11:44:36,969] ({pool-2-thread-12} RReplInterpreter.scala[liftedTree1$1]:41) - intrpreting %dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")
ERROR [2016-04-21 11:44:36,975] ({pool-2-thread-12} RClient.scala[eval]:79) - R Error .zreplout <- rzeppelin:::.z.valuate(.zreplin) <text>:1:1: unexpected input
1: %dep
    ^
 INFO [2016-04-21 11:44:36,978] ({pool-2-thread-12} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1461228276968 finished by scheduler 1367682354
 INFO [2016-04-21 11:45:22,157] ({pool-2-thread-8} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1461228322157 started by scheduler org.apache.zeppelin.spark.SparkInterpreter1173192611

Solution

  • Each cell can hold one type of interpreter. Thus is order to use %dep and %spark you should separate them into two cells starting with %dep after restarting the spark interpreter so it can be taken into consideration. e.g :

    In the first cell :

    %dep
    z.reset()
    z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
    z.load("com.databricks:spark-csv_2.10:1.2.0")
    

    Now that your dependencies are loaded, you can access spark interpreter in a different cell:

    %spark
    val a = 1
    

    PS: By default, a cell runs with the spark interpreter so you don't need to explicitly use %spark.