Search code examples
apache-sparkcassandraapache-spark-sqlspark-cassandra-connector

Spark SQL - registered temporary table not found


I run the following command:

spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10

Then I stop the context with:

sc.stop

Then I run this code in the REPL:

val conf = new org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new org.apache.spark.SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val cc = new org.apache.spark.sql.cassandra.CassandraSQLContext(sc)

cc.setKeyspace("ksp")

cc.sql("SELECT * FROM continents").registerTempTable("conts")

val allContinents = sqlContext.sql("SELECT * FROM conts").collect

And I get:

org.apache.spark.sql.AnalysisException: Table not found: conts;

The keyspace ksp and table continents are defined in Cassandra, so I suspect the error isn't from that side.

(Spark 1.6.0,1.6.1)


Solution

  • Because you use different context for creating dataframe and execute SQL.

    val conf = new 
    org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
    val sc = new org.apache.spark.SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val cc = new org.apache.spark.sql.cassandra.CassandraSQLContext(sc)
    
    cc.setKeyspace("ksp")
    
    cc.sql("SELECT * FROM continents").registerTempTable("conts")
    
    // use cc instead of sqlContext
    val allContinents = cc.sql("SELECT * FROM conts").collect