I am trying to read XML file using SBT but i am facing issue when i compile it.
name:= "First Spark"
version:= "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
libraryDependencies += "com.databricks" % "spark-avro_2.10" % "2.0.1"
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.2"
resolvers += Resolver.mavenLocal
.scala file
package in.goai.spark
import scala.xml._
import com.databricks.spark.xml
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("First Spark")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val fileName = args(0)
val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "book").load("fileName")
val selectedData = df.select("title", "price")
val d = selectedData.show
when i compile it by giving "sbt package" it shows bellow error
[error] /home/hadoop/dev/first/src/main/scala/SparkMeApp.scala:4: object xml is not a member of package com.databricks.spark
[error] import com.databricks.spark.xml
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed Sep 22, 2017 4:11:19 PM
Do i need to add any other jar files related to xml? please suggest and please provide me any link which gives information about jar files for different file formats
Because you're using Scala 2.11 and Spark 2.0, in build.sbt
, change your dependencies to the following:
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
libraryDependencies += "com.databricks" %% "spark-avro" % "3.2.0"
libraryDependencies += "com.databricks" %% "spark-xml" % "0.4.1"
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.6"
version to 3.2.0: https://github.com/databricks/spark-avro#requirements"com.databricks" %% "spark-xml" % "0.4.1"
: https://github.com/databricks/spark-xml#scala-211scala-xml
version to 1.0.6, the current version for Scala 2.11: http://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml_2.11In your code, delete the following import statement:
import com.databricks.spark.xml
Note that your code doesn't actually use the spark-avro
or scala-xml
libraries. Remove those dependencies from your build.sbt
(and the import scala.xml._
statement from your code) if you're not going to use them.