I am trying to read snappy compressed parquet file but keep on getting below exception. I am not able to find the root cause for this exception, can someone please guide me here?
val sparkSession: SparkSession = SparkSession.builder()
.master("local[2]")
.config("spark.ui.enabled",false)
.appName("local-intellij")
.getOrCreate()
val df = sparkSession.read.parquet("C:\\data\\parquet\\part-00000-4ce5708f-2f50-485d-8ae4-7c5ea440fda6.c000.snappy.parquet")
My dependencies are :
lazy val json4sVersion = "3.5.0"
lazy val json4sDeps = Seq(
"org.json4s" %% "json4s-core" % json4sVersion,
"org.json4s" %% "json4s-native" % json4sVersion,
"org.json4s" %% "json4s-ast" % json4sVersion,
"org.json4s" %% "json4s-jackson" % json4sVersion)
lazy val sparkVersionCore = "2.3.0.cloudera2"
lazy val sparkDeps = Seq(
"org.apache.spark" %% "spark-hive" % sparkVersionCore,
"org.apache.spark" %% "spark-core" % sparkVersionCore)
java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue; at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:113) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString$3.apply(ParquetFileFormat.scala:650) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString$3.apply(ParquetFileFormat.scala:650) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.org$apache$spark$sql$execution$datasources$parquet$ParquetFileFormat$$deserializeSchemaString(ParquetFileFormat.scala:650) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readSchemaFromFooter$1.apply(ParquetFileFormat.scala:643) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readSchemaFromFooter$1.apply(ParquetFileFormat.scala:643)
clear jar version mismatch for spark version 2.3.0.
AFAIK you have to use org.json4s json4s-jackson_2.11 3.2.11
// https://mvnrepository.com/artifact/org.json4s/json4s-jackson
libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.2.11"
AFAIK this entry is not reuqired I think will be auomatically downloaded once you mention the spark version in sbt. Just try to remove the entry and see... if its not working add aforementioned entry.