Search code examples
apache-sparkibm-cloud

You must build Spark with Hive. Export 'SPARK_HIVE=true'


I'm trying to run a notebook on Analytics for Apache Spark running on Bluemix, but I hit the following error:

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and 
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))

The error is intermittent - it doesn't always happen. The line of code in question is:

df = sqlContext.read.format('jdbc').options(
            url=url, 
            driver='com.ibm.db2.jcc.DB2Driver', 
            dbtable='SAMPLE.ASSETDATA'
        ).load()

There are a few similar questions on stackoverflow, but they aren't asking about the spark service on bluemix.


Solution

  • Create a new SQLContext object before using sqlContext:

    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    

    and then run the code again.

    This error happens if you have multiple notebooks using the out of box sqlContext.