Assume you have
val df = Seq(("Jack", 91, 86), ("Mike", 79, 85), ("Julia", 93, 70)).toDF("Name", "Maths", "Art")
which gives:
+-----+-----+---+
| Name|Maths|Art|
+-----+-----+---+
| Jack| 91| 86|
| Mike| 79| 85|
|Julia| 93| 70|
+-----+-----+---+
Now you want to unpivot it by:
df.select($"Name", expr("stack(2, 'Maths', Maths, 'Art', Art) as (Subject, Score)"))
which gives:
+-----+-------+-----+
| Name|Subject|Score|
+-----+-------+-----+
| Jack| Maths| 91|
| Jack| Art| 86|
| Mike| Maths| 79|
| Mike| Art| 85|
|Julia| Maths| 93|
|Julia| Art| 70|
+-----+-------+-----+
So far so godd! Now, what if you don't know the list of column names? What if the list of column names is long or it can change? How can we avoid hardcoding the column names stupidly like that?
Or even something like this is also good:
// fake code
df.select($"Name", unpivot(df.columns.diff("Name")) as ("Subject", "Score"))
Why don't we have api like this?
This works quite well indeed:
def melt(preserves: Seq[String], toMelt: Seq[String], column: String = "variable", row: String = "value", df: DataFrame) : DataFrame = {
val _vars_and_vals = array((for (c <- toMelt) yield { struct(lit(c).alias(column), col(c).alias(row)) }): _*)
val _tmp = df.withColumn("_vars_and_vals", explode(_vars_and_vals))
val cols = preserves.map(col _) ++ { for (x <- List(column, row)) yield { col("_vars_and_vals")(x).alias(x) }}
_tmp.select(cols: _*)
}
Source: How to melt Spark DataFrame? Thanks to @user10938362