Search code examples
dataframescalaapache-sparksequence

Append Seq[Column] to existing Spark dataframe Scala


I have Spark Dataframe df_data and Seq[Column] Metrics. I am trying to append the columns from Seq[Column] to existing Dataframe. Seq[Column] will have multiple columns.

I was able to append single column to Dataframe. But not able to proceed with Seq[Column]

  val new_df = Metrics.foldLeft(df_data)((df_data, newColumn: (Column)) =>
              df_data.withColumn("column_name", newColumn))

I have not worked much in Seq. Need help


Solution

  • use:

    import org.apache.spark.sql.functions.expr
    val new_df = df_data.select((expr("*") +: Metrics) :_*)
    

    if the brackets are needed etc. will depend on spark/scala version. It's worth avoiding with_column in folds/loops as the projections created aren't always optimised out.