Search code examples
apache-sparkgoogle-cloud-platformhbasegoogle-cloud-bigtable

Spark-HBase - GCP template (3/3) - Missing libraries?


I'm trying to test the Spark-HBase connector in the GCP context and tried to follow the instructions, which asks to locally package the connector, and I get the following error when submitting the job on Dataproc (after having completed these steps).

Command

(base) gcloud dataproc jobs submit spark --cluster $SPARK_CLUSTER --class com.example.bigtable.spark.shc.BigtableSource --jars target/scala-2.11/cloud-bigtable-dataproc-spark-shc-assembly-0.1.jar --region us-east1 -- $BIGTABLE_TABLE

Error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration


Solution

  • I found a working way, by adding following dependencies in build.sbt - thanks @jccampanero for the guidance !

    libraryDependencies += "org.apache.hbase" % "hbase-common" % "2.0.2"
    libraryDependencies += "org.apache.hbase" % "hbase-mapreduce" % "2.0.2"
    

    Output (Bigtablesource.scala)

    +------+-----+----+----+
    |  col0| col1|col2|col3|
    +------+-----+----+----+
    |row000| true| 0.0|   0|
    |row001|false| 1.0|   1|
    |row002| true| 2.0|   2|
    |row003|false| 3.0|   3|
    |row004| true| 4.0|   4|
    |row005|false| 5.0|   5|
    |row006| true| 6.0|   6|
    |row007|false| 7.0|   7|
    |row008| true| 8.0|   8|
    |row009|false| 9.0|   9|
    |row010| true|10.0|  10|
    |row011|false|11.0|  11|
    |row012| true|12.0|  12|
    |row013|false|13.0|  13|
    |row014| true|14.0|  14|
    |row015|false|15.0|  15|
    |row016| true|16.0|  16|
    |row017|false|17.0|  17|
    |row018| true|18.0|  18|
    |row019|false|19.0|  19|
    +------+-----+----+----+
    only showing top 20 rows
    
    +------+-----+
    |  col0| col1|
    +------+-----+
    |row000| true|
    |row001|false|
    |row002| true|
    |row003|false|
    |row004| true|
    |row005|false|
    +------+-----+
    
    +------+-----+
    |  col0| col1|
    +------+-----+
    |row000| true|
    |row001|false|
    |row002| true|
    |row003|false|
    |row004| true|
    |row005|false|
    +------+-----+
    
    +------+-----+
    |  col0| col1|
    +------+-----+
    |row251|false|
    |row252| true|
    |row253|false|
    |row254| true|
    |row255|false|
    +------+-----+
    
    +-----------+
    |count(col1)|
    +-----------+
    |         50|
    +-----------+