Search code examples

Spark with Hive, Unable to instantiate SparkSession with Hive support because Hive classes are not found

The spark app is to load data from Hive:

    SparkSession spark = SparkSession.builder()
        .config("hive.metastore.uris", "thrift://device1:9083")

I start the spark by:

spark-submit --master local[*] --class zhihu.SparkConsumer target/original-kafka-consumer-0.1-SNAPSHOT.jar  --jars spark-hive_2.11-2.4.4.jar

maven pom.xml

<project xmlns="" xmlns:xsi=""
      <!-- Proper URL for Cloudera maven artifactory -->


<!-- -->
<!-- -->
<dependency> <!-- Spark dependency -->

<dependency> <!-- Spark dependency -->





  <!-- gson -->




  <!-- runtime Hive -->












it looks no problem, but it always raised:

20/05/07 12:03:17 INFO spark.SparkContext: Added JAR file:/data/projects/zhihu_scraper/consumers/target/original-kafka-consumer-0.1-SNAPSHOT.jar at spark://device2:42395/jars/original-kafka-consumer-0.1-SNAPSHOT.jar with timestamp 1588824197724
20/05/07 12:03:17 INFO executor.Executor: Starting executor ID driver on host localhost
20/05/07 12:03:17 INFO util.Utils: Successfully started service '' on port 33849.
20/05/07 12:03:17 INFO netty.NettyBlockTransferService: Server created on device2:33849
20/05/07 12:03:17 INFO storage.BlockManager: Using for block replication policy
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager device2:33849 with 366.3 MB RAM, BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, device2, 33849, None)
20/05/07 12:03:17 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63e5e5b4{/metrics/json,null,AVAILABLE,@Spark}
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
    at zhihu.SparkConsumer.main(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/05/07 12:03:18 INFO spark.SparkContext: Invoking stop() from shutdown hook

I have tried all answers in this post How to create SparkSession with Hive support. But, none of them work for me.


  • <dependency> <!-- Spark dependency -->

    I dont know why compile is the scope it should be runtime. Since You are using maven shade plugin you can package uber jar(with target/original-kafka-consumer-0.1-SNAPSHOT.jar) with all dependencies in one umbrella/archive and it will be in the classpath so that nothing is missed try this.

    Also hive-site.xml should be in classpath. then there is no need to seperately configure metastoreuris. in programatic way.

    Further reading