Search code examples
scalamavenapache-sparksbtparquet

Not able to read parquet files in spark : java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods


I am trying to read snappy compressed parquet file but keep on getting below exception. I am not able to find the root cause for this exception, can someone please guide me here?

   val sparkSession: SparkSession = SparkSession.builder()
  .master("local[2]")
  .config("spark.ui.enabled",false)
  .appName("local-intellij")
  .getOrCreate()
val df = sparkSession.read.parquet("C:\\data\\parquet\\part-00000-4ce5708f-2f50-485d-8ae4-7c5ea440fda6.c000.snappy.parquet")

My dependencies are :

lazy val json4sVersion = "3.5.0"
lazy val json4sDeps = Seq(
"org.json4s" %% "json4s-core" % json4sVersion,
"org.json4s" %% "json4s-native" % json4sVersion,
"org.json4s" %% "json4s-ast" % json4sVersion,
"org.json4s" %% "json4s-jackson" % json4sVersion)
lazy val sparkVersionCore = "2.3.0.cloudera2"
lazy val sparkDeps = Seq(
"org.apache.spark" %% "spark-hive" % sparkVersionCore,
"org.apache.spark" %% "spark-core" % sparkVersionCore)

java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue; at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:113) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString$3.apply(ParquetFileFormat.scala:650) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString$3.apply(ParquetFileFormat.scala:650) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString(ParquetFileFormat.scala:650) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readSchemaFromFooter$1.apply(ParquetFileFormat.scala:643) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readSchemaFromFooter$1.apply(ParquetFileFormat.scala:643)


Solution

  • clear jar version mismatch for spark version 2.3.0.

    AFAIK you have to use org.json4s json4s-jackson_2.11 3.2.11

    // https://mvnrepository.com/artifact/org.json4s/json4s-jackson
    libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.2.11"
    

    AFAIK this entry is not reuqired I think will be auomatically downloaded once you mention the spark version in sbt. Just try to remove the entry and see... if its not working add aforementioned entry.