sql pandas pyspark apache-spark-sql pivot

pyspark transform column to row

I have pyspark dataframe below,

I need to convert the dataframe in the following manner, I need to pivot row into a separate column and add it to corresponding month.

I have tried the following approach, df.groupBY("Month").pivot("col1"). I am not getting the desired output.

Solution

You can use pandas_api() to enable Pandas API on Spark DataFrame (not to be confused with to_pandas()):

sdf = (sdf.pandas_api()
          .melt('Month', var_name='Col_name', value_name='Value')
          .to_spark())

Output:

>>> sdf.show()
+-----+--------+-----+
|Month|Col_name|Value|
+-----+--------+-----+
|    1|    Col1|   10|
|    1|    Col2|   20|
|    1|    Col3|   30|
|    1|    Col4|   40|
|    2|    Col1|   20|
|    2|    Col2|   30|
|    2|    Col3|   50|
|    2|    Col4|   60|
|    3|    Col1|   30|
|    3|    Col2|   50|
|    3|    Col3|   60|
|    3|    Col4|   70|
|    4|    Col1|   30|
|    4|    Col2|   40|
|    4|    Col3|   50|
|    4|    Col4|   80|
+-----+--------+-----+