Search code examples
scalaapache-sparkapache-spark-sqlazure-databricks

Parsing nested XML in Databricks


I am trying to p

I am trying to read the XML into a data frame and trying to flatten the data using explode as below.

val df = spark.read.format("xml").option("rowTag","on").option("inferschema","true").load("filepath") val parsxml= df .withColumn("exploded_element", explode(("prgSvc.element"))).

I am getting the below error.

command-5246708674960:4: error: type mismatch;
found   : String("prgSvc.element")
required: org.apache.spark.sql.Column
.withColumn("exploded_element", explode(("prgSvc.element")))**

Before reading the XML into the data frame, I also tried to manually assign a custom schema and read the XML file. But the output is all NULL. Could you please let me know if my approach is valid and how to resolve this issue and achieve the output.
Thank you.
    

Solution

  • Use this

    import spark.implicits._
    
    val parsxml= df .withColumn("exploded_element", explode($"prgSvc.element"))