Search code examples
apache-sparkibm-cloudhivecontext

HiveContext in Bluemix Spark


In bluemix spark I want to use HiveContext

HqlContext = HiveContext(sc)
//some code
 df = HqlContext.read.parquet("swift://notebook.spark/file.parquet")

I get following error

Py4JJavaError: An error occurred while calling o45.parquet. : java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient


Solution

  • The HiveContext is not included by default in the Bluemix Spark offering.

    To include it in your notebook, you should be able to use %AddJar to load it from a publicly accessible server, e.g.:

    %AddJar http://my.server.com/jars/spark-hive_2.10-1.5.2.jar
    

    You can also point this at Maven's repository link:

    %AddJar http://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.10/1.5.2/spark-hive_2.10-1.5.2.jar
    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    

    This works to enable the Hive Context for me.

    Now, it's worth noting that the latest available versions on Maven probably don't line up with the current version of Spark running on Bluemix, so my suggestion is to check the version of Spark on Bluemix by using:

    sc.version
    

    Then match the version of this JAR to that version of Spark.