Search code examples
scalaapache-sparkapache-iceberg

Spark Shell not working after adding support for Iceberg


We are doing POC on Iceberg and evaluating it first time.

Spark Environment:

  • Spark Standalone Cluster Setup ( 1 master and 5 workers)
  • Spark: spark-3.1.2-bin-hadoop3.2
  • Scala: 2.12.10
  • Java: 1.8.0_321
  • Hadoop: 3.2.0
  • Iceberg 0.13.1

As suggested in Iceberg's official documentation, to add support for Iceberg in Spark shell, we are adding Iceberg dependency while launching the Spark shell as below,

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1

After launching the Spark shell with the above command, we are not able to use the Spark shell at all. For all the commands (even non Iceberg) we are getting the same exception as below,

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/BinaryCommand

Below simple command also throwing same exception.

val df : DataFrame = spark.read.json("/spark-3.1.2-bin-hadoop3.2/examples/src/main/resources/people.json")
df.show()

In Spark source code, BinaryCommand class belongs to Spark SQL module, so tried explicitly adding Spark SQL dependency while launching Spark shell as below, but still getting same exception.

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1,org.apache.spark:spark-sql_2.12:3.1.2

When we launch spark-shell normally i.e. without Iceberg dependency, then it is working properly.


Solution

  • We are using the wrong Iceberg version, choose the spark 3.2 iceberg jar but running Spark 3.1. After using the correct dependency version (i.e. 3.1), we are able to launch the Spark shell with Iceberg. Also no need to specify org.apache.spark Spark jars using packages since all of that will be on the classpath anyway.

    spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:0.13.1