I have a DataFrame
with a few columns. Now I want to add two more columns to the existing DataFrame.
Currently I am doing this using withColumn
method in DataFrame.
for example:
df.withColumn("newColumn1", udf(col("somecolumn")))
.withColumn("newColumn2", udf(col("somecolumn")))
Actually I can return both newcoOlumn values in single UDF method using Array[String]. But currently this is how I am doing it.
Is there anyway, I can do this effectively? using explode
is the good option here?
Even if I have to use explode
, I have to use withColumn
once, then return the column value as Array[String]
, then using explode
, create two more columns.
Which one is effective? or is there any alternatives?
**Update:**Refer @blert answer, withColumns
is the way to go.
May 2023: It's now possible with new withColumns
(notice the final 's') method to add several columns to an existing Spark dataframe without calling several times withColumn
. You just need a map Map[String, Column]
. Given two UDF's for this example udf1
and udf2
you could use this new method like this:
val dfNew=df.withColumns(Map("newCol1"->udf1(col("oldCol1")),"newCol2"->udf2(col("oldCol2"))))
More information on this can be found now at official doc.