I have a Spark Dataframe with the following contents:
Name | E1 | E2 | E3 |
---|---|---|---|
abc | 4 | 5 | 6 |
I need the various E
columns to become rows in a new column as shown below:
Name | value | EType |
---|---|---|
abc | 4 | E1 |
abc | 5 | E2 |
abc | 6 | E3 |
This answer
gave me the idea of using explode
and I now have the following code:
df.select($"Name", explode(array("E1", "E2", "E3")).as("value"))
The above code gives me the Name and value columns I need, but I still need a way to add in the EType column based on which value in the array passed to explode
is being used to populate that particular row.
Output of the above code:
Name | value |
---|---|
abc | 4 |
abc | 5 |
abc | 6 |
How can I add the Etype column?
(I am using Spark 2.2 with Scala)
Thanks!
You can use stack
function for this particular case.
df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
df.selectExpr('Name', "stack(3, E1, 'E1', E2, 'E2', E3, 'E3')").toDF('Name', 'value', 'EType').show()
+----+-----+-----+
|Name|value|EType|
+----+-----+-----+
| abc| 4| E1|
| abc| 5| E2|
| abc| 6| E3|
+----+-----+-----+