apache-spark apache-spark-sql apache-zeppelin

Zeppelin and SqlContext

I have a real simple Zeppelin notebook with three paragraphs - based on the Zeppelin-Demo notebook but only difference is that the bankText RDD is created using the textFile method.

Paragraph 1:

%sh
wget http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip
unzip bank.zip

Paragraph 2:

val bankText = sc.textFile("bank.zip")

case class BankRow(age: Integer, job: String, marital: String, education: String, balance: Integer)

val bank2 = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
    s => BankRow(s(0).toInt, 
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
).toDF()
bank2.registerTempTable("bank2”)

Paragraph 3:

%sql 
select age, count(1) value
from bank2 
where age < 30
group by age 
order by age

Paragraphs 1 and 2 run fine – but the third paragraph errors with:

org.apache.spark.sql.AnalysisException: no such table bank2; line 2 pos 5 at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:260) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:264) at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)…

The Zeppelin demo works just fine. I am running this on my sandbox which uses spark 1.5.2 built for Hadoop 2.6 (spark-1.5.2-bin-hadoop2.6.tgz) and Zeppelin 0.5.5 – again a binary zeppelin-0.5.5-incubating-bin-all.tgz.

I suspect that it is something to do with the SqlContext – since I believe Zeppelin injects it’s own SqlContext.

Any tips? Feels like I am missing something very simple.

Solution

I have figured out the solution to the problem. There is a bug in Zeppelin that I will need to reproduce and send over to the team. Seems that if you are a noob to Zeppelin (like me!) and create your own sqlContext you effectively break the notebook - until you restart the kernel all the tables get registered in the wrong context and the subsequent paragraphs do not have the table in scope. Restarting the kernel fixed the problem.