Search code examples
apache-sparkapache-spark-sqlcassandraazure-eventhubspark-cassandra-connector

Spark Session Catalog Failure


I'm reading data in batch from a Cassandra database & also in streaming from Azure EventHubs using Scala Spark API.

session.read
  .format("org.apache.spark.sql.cassandra")
  .option("keyspace", keyspace)
  .option("table", table)
  .option("pushdown", pushdown)
  .load()

&

session.readStream
  .format("eventhubs")
  .options(eventHubsConf.toMap)
  .load()

Everything was running fine, but now I get this exception out frow nowhere...

User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(Lscala/Function0;Lscala/Function0;Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;Lorg/apache/spark/sql/internal/SQLConf;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/catalyst/parser/ParserInterface;Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)V
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:132)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:131)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:157)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:428)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)

I don't know what changed exactly but here is my dependencies :

ThisBuild / scalaVersion := "2.11.11"
val sparkVersion = "2.4.0"

libraryDependencies ++= Seq(
  "org.apache.logging.log4j" % "log4j-core" % "2.11.1",
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion  % "provided",
  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.3.10",
  "com.microsoft.azure" % "azure-eventhubs" % "2.3.0",
  "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
  "org.scala-lang.modules" %% "scala-java8-compat" % "0.9.0",
  "com.twitter" % "jsr166e" % "1.1.0",
  "com.holdenkarau" %% "spark-testing-base" % "2.4.0_0.12.0" % Test,
  "MrPowers" % "spark-fast-tests" % "0.19.2-s_2.11" % Test
)

Anyone have a clue ?


Solution

  •    java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init(  
       scala/Function0;Lscala/Function0; 
       Lorg/apache/spark/sql/catalyst/analysis/FunctionRegistry;
       Lorg/apache/spark/sql/internal/SQLConf;
       Lorg/apache/hadoop/conf/Configuration;
       Lorg/apache/spark/sql/catalyst/parser/ParserInterface;
       Lorg/apache/spark/sql/catalyst/catalog/FunctionResourceLoader;)
    

    Suggests to me that one of the ilbraries was compiled against a version of Spark that is different than the one that is currently on the runtime path. Since the above method signature does match the Spark 2.4.0 signature see

    https://github.com/apache/spark/blob/v2.4.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L56-L63

    But not the Spark 2.3.0 Signature.

    My guess would be there is a runtime Spark 2.3.0 somewhere? Perhaps you are running the application using Spark-Submit from a Spark 2.3.0 install?