Search code examples
scaladataframeapache-sparketl

How to explode each row that is an Array into columns in Spark (Scala)?


I have a Spark DataFrame with the a single column 'value', whereby each row is an Array of equal length. How can I explode this single 'value' column into multiple columns, which follow a schema like this?

Single-column DataFrame

val bronzeDfSchema = new StructType()
  .add("DATE", IntegerType)
  .add("NUMARTS", IntegerType)
  .add("COUNTS", StringType)
  .add("THEMES", StringType)
  .add("LOCATIONS", StringType)
  .add("PERSONS", StringType)
  .add("ORGANIZATIONS", StringType)
  .add("TONE", StringType)
  .add("CAMEOEVENTIDS", StringType)
  .add("SOURCES", StringType)
  .add("SOURCEURLS", StringType)

Thank you!


Solution

  • This should work just fine

    val schema=Seq(("DATE",0),("NUMARTS",1),("COUNTS",2),("THEMES",3),("LOCATIONS",4),("PERSONS",5),("ORGANIZATIONS",6),("TONE",7),("CAMEOEVENTIDS",8),("SOURCES",9),("SOURCEURLS",10))
    
    val df2=schema.foldLeft(df)((df,x)=>df.withColumn(x._1,col("value").getItem(x._2)))
    

    After you do this just cast the column into the data type you want.