I have a dataframe
syr | P1 | P2
-----------------
1 | 200 | 300
2 | 500 | 700
3 | 900 | 400
I want to create another DataFrame which has max value between col2
& col3
. An expected output is like:
syr | P1 | P2 | max
-------------------------
1 | 200 | 300 | 300
2 | 500 | 700 | 700
3 | 900 | 400 | 900
You could define a new UDF function to catch the max value between two column, like:
def maxDef(p1: Int, p2: Int): Int = if(p1>p2) p1 else p2
val max = udf[Int, Int, Int](maxDef)
And then apply the UDF in a withColumn()
to define a new Column, like:
val df1 = df.withColumn("max", max(df.col("P1"), df.col("P2")))
+---+---+---+---+
|syr| P1| P2|max|
+---+---+---+---+
| 1|200|300|300|
| 2|500|700|700|
| 3|900|400|900|
+---+---+---+---+
EDIT: Iterate through columns
First initialize the max
Column:
df = df.withColumn("max", lit(0))
then foreach Column you want (use filter
function property) compare it with the max
Column.
df.columns.filter(_.startsWith("P")).foreach(col => {
df = df.withColumn("max", max(df.col("max"), df.col(col)))
})