Search code examples
scalasbtsbt-assembly

sbt assembly failing with error: object spark is not a member of package org.apache even though spark-core and spark-sql libraries are included


I am attempting to use sbt assembly on a spark project. sbt compile and package work but when I attempt sbt assembly I get the following error:

object spark is not a member of package org.apache

I have included the spark core and spark sql libraries and have sbt-assembly in my plugins file. Why is assembly producing these errors?

build.sbt:

name := "redis-record-loader"

scalaVersion := "2.11.8"

val sparkVersion = "2.3.1"
val scalatestVersion = "3.0.3"
val scalatest = "org.scalatest" %% "scalatest" % scalatestVersion

libraryDependencies ++=
  Seq(
    "com.amazonaws" % "aws-java-sdk-s3" % "1.11.347",
    "com.typesafe" % "config" % "1.3.1",
    "net.debasishg" %% "redisclient" % "3.0",
    "org.slf4j" % "slf4j-log4j12" % "1.7.12",
    "org.apache.commons" % "commons-lang3" % "3.0" % "test,it",
    "org.apache.hadoop" % "hadoop-aws" % "2.8.1" % Provided,
    "org.apache.spark" %% "spark-core" % sparkVersion % Provided,
    "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
    "org.mockito" % "mockito-core" % "2.21.0" % Test,
    scalatest
)

val integrationTestsKey = "it"
val integrationTestLibs = scalatest % integrationTestsKey

lazy val IntegrationTestConfig = config(integrationTestsKey) extend Test

lazy val root = project.in(file("."))
  .configs(IntegrationTestConfig)
  .settings(inConfig(IntegrationTestConfig)(Defaults.testSettings): _*)
  .settings(libraryDependencies ++= Seq(integrationTestLibs))

test in assembly := Seq(
  (test in Test).value,
  (test in IntegrationTestConfig).value
)

assemblyMergeStrategy in assembly := {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
}

plugins.sbt:

logLevel := Level.Warn

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")

full error message:

/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:11: object spark is not a member of package org.apache
[error] import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
[error]                   ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: type SparkSession
[error]   implicit val spark: SparkSession = SparkSession.builder
[error]                       ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:26: not found: value SparkSession
[error]   implicit val spark: SparkSession = SparkSession.builder
[error]                                      ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: not found: type DataFrame
[error]   val testDataframe0: DataFrame = testData0.toDF()
[error]                       ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:51: value toDF is not a member of Seq[(String, String)]
[error]   val testDataframe0: DataFrame = testData0.toDF()
[error]                                             ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: not found: type DataFrame
[error]   val testDataframe1: DataFrame = testData1.toDF()
[error]                       ^
[error] /Users/jones8/Work/redis-record-loader/src/it/scala/com/elsevier/bos/RedisRecordLoaderIntegrationSpec.scala:52: value toDF is not a member of Seq[(String, String)]
[error]   val testDataframe1: DataFrame = testData1.toDF()
[error]                                             ^
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access term spark in package org.apache,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.
[error] missing or invalid dependency detected while loading class file 'RedisRecordLoader.class'.
[error] Could not access type SparkSession in value org.apache.sql,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'RedisRecordLoader.class' was compiled against an incompatible version of org.apache.sql.
[error] 9 errors found

Solution

  • Cant comment on that I can say "I doubt the AWS SDK & hadoop-aws versions are going to work". You need the exact version of hadoop-aws to match the hadoop-common JAR on your CP, (it's all one project which releases in sync, after all), and the aws SDK Version built against was 1.10. The AWS SDK has a habit of (a) breaking APIs on every point release (b) aggressively pushing new versions of jackson down, even when they are incompatible and (c) causing regressions in the hadoop-aws code.

    If you really want to work with S3A, best to go for hadoop-2.9, which pulls in a shaded 1.11.x version