I have pyspark dataframe below,
I need to convert the dataframe in the following manner, I need to pivot row into a separate column and add it to corresponding month.
I have tried the following approach,
df.groupBY("Month").pivot("col1").
I am not getting the desired output.
You can use pandas_api()
to enable Pandas API on Spark DataFrame (not to be confused with to_pandas()
):
sdf = (sdf.pandas_api()
.melt('Month', var_name='Col_name', value_name='Value')
.to_spark())
Output:
>>> sdf.show()
+-----+--------+-----+
|Month|Col_name|Value|
+-----+--------+-----+
| 1| Col1| 10|
| 1| Col2| 20|
| 1| Col3| 30|
| 1| Col4| 40|
| 2| Col1| 20|
| 2| Col2| 30|
| 2| Col3| 50|
| 2| Col4| 60|
| 3| Col1| 30|
| 3| Col2| 50|
| 3| Col3| 60|
| 3| Col4| 70|
| 4| Col1| 30|
| 4| Col2| 40|
| 4| Col3| 50|
| 4| Col4| 80|
+-----+--------+-----+