scala apache-spark apache-spark-sql apache-spark-dataset

Spark explode in Scala - Add exploded column to the row

I have a Spark Dataframe with the following contents:

Name	E1	E2	E3
abc	4	5	6

I need the various E columns to become rows in a new column as shown below:

Name	value	EType
abc	4	E1
abc	5	E2
abc	6	E3

This answer gave me the idea of using explode and I now have the following code:

df.select($"Name", explode(array("E1", "E2", "E3")).as("value"))

The above code gives me the Name and value columns I need, but I still need a way to add in the EType column based on which value in the array passed to explode is being used to populate that particular row.

Output of the above code:

Name	value
abc	4
abc	5
abc	6

How can I add the Etype column?

(I am using Spark 2.2 with Scala)

Thanks!

Solution

You can use stack function for this particular case.

df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()

df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
+----+-----+-----+
|Name|value|EType|
+----+-----+-----+
| abc|    4|   E1|
| abc|    5|   E2|
| abc|    6|   E3|
+----+-----+-----+