Search code examples
scaladataframeapache-spark-sql

Adding two columns to existing DataFrame using withColumn


I have a DataFrame with a few columns. Now I want to add two more columns to the existing DataFrame.

Currently I am doing this using withColumn method in DataFrame.

for example:

df.withColumn("newColumn1", udf(col("somecolumn")))
  .withColumn("newColumn2", udf(col("somecolumn")))

Actually I can return both newcoOlumn values in single UDF method using Array[String]. But currently this is how I am doing it.

Is there anyway, I can do this effectively? using explode is the good option here?

Even if I have to use explode, I have to use withColumn once, then return the column value as Array[String], then using explode, create two more columns.

Which one is effective? or is there any alternatives?

**Update:**Refer @blert answer, withColumns is the way to go.


Solution

  • May 2023: It's now possible with new withColumns (notice the final 's') method to add several columns to an existing Spark dataframe without calling several times withColumn. You just need a map Map[String, Column]. Given two UDF's for this example udf1 and udf2 you could use this new method like this:

    val dfNew=df.withColumns(Map("newCol1"->udf1(col("oldCol1")),"newCol2"->udf2(col("oldCol2"))))
    

    More information on this can be found now at official doc.