Search code examples
javascalaapache-sparkgoogle-cloud-platformgoogle-cloud-bigtable

Bigtable from Dataproc: Dependency conflict even after shading the jars


I am trying to run a Spark Application to write and read data to Cloud Bigtable from Dataproc.

Initially, I got this exception java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. Then came to know that there are some dependency issues from this Google Documentation - [Manage Java and Scala dependencies for Apache Spark][1].

Following the instructions, I changed my build.sbt file to shade the jars -

assembly / assemblyShadeRules := Seq(
  ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
  ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
  ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)

Then got this error

repackaged.io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
  at repackaged.io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
  at repackaged.io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:353)
  at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:107)
  at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:85)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:237)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:231)
  at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201)
  at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create(EnhancedBigtableStub.java:175)
  at com.google.cloud.bigtable.data.v2.BigtableDataClient.create(BigtableDataClient.java:165)
  at com.groupon.crm.BigtableClient$.getDataClient(BigtableClient.scala:59)
  ... 44 elided

Following that, I added the dependency of in my build.sbt file.

libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Still, I am getting the same error.

Environment details Dataproc details -

"software_config": {
      "image_version": "1.5-debian10",
      "properties": {
        "dataproc:dataproc.logging.stackdriver.job.driver.enable": "true",
        "dataproc:dataproc.logging.stackdriver.enable": "true",
        "dataproc:jobs.file-backed-output.enable": "true",
        "dataproc:dataproc.logging.stackdriver.job.yarn.container.enable": "true",
        "capacity-scheduler:yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
        "hive:hive.server2.materializedviews.cache.at.startup": "false",
        "spark:spark.jars":"XXXX"
      },
      "optional_components": ["ZEPPELIN","ANACONDA","JUPYTER"]
    }

Spark Job details -

val sparkVersion = "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "com.google.cloud" % "google-cloud-bigtable" % "2.23.1"
libraryDependencies += "com.google.auth" % "google-auth-library-oauth2-http" % "1.17.0"
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Solution

  • Finally, I solved the issue myself. I followed the following steps.

    1. In the path src/main/resources, add META-INF directory and inside that folder, add services directory.
    2. Inside src/main/resources/META-INF/services directory add 2 files, namely, io.grpc.LoadBalancerProvider and io.grpc.NameResolverProvider.
    3. Add the following content to io.grpc.LoadBalancerProvider file io.grpc.internal.PickFirstLoadBalancerProvider.
    4. Add the following content to io.grpc.internal.NameResolverProvider file io.grpc.internal.DnsNameResolverProvider.
    5. Finally make changes to your build.sbt as follows.
    libraryDependencies += "io.grpc" % "grpc-netty-shaded" % "1.55.1"
    
    assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.protobuf.**" -> "shade_proto.@1").inAll,
      ShadeRule.rename("com.google.common.**" -> "shade_googlecommon.@1").inAll
    )
    
    assembly / assemblyMergeStrategy := {
      case path if path.contains("META-INF/services") => MergeStrategy.concat
      case PathList("META-INF", _*) => MergeStrategy.discard
      case _ => MergeStrategy.first
    }