Search code examples
sqlpandaspysparkapache-spark-sqlpivot

pyspark transform column to row


I have pyspark dataframe below,

enter image description here

I need to convert the dataframe in the following manner, I need to pivot row into a separate column and add it to corresponding month.

enter image description here

I have tried the following approach, df.groupBY("Month").pivot("col1"). I am not getting the desired output.


Solution

  • You can use pandas_api() to enable Pandas API on Spark DataFrame (not to be confused with to_pandas()):

    sdf = (sdf.pandas_api()
              .melt('Month', var_name='Col_name', value_name='Value')
              .to_spark())
    

    Output:

    >>> sdf.show()
    +-----+--------+-----+
    |Month|Col_name|Value|
    +-----+--------+-----+
    |    1|    Col1|   10|
    |    1|    Col2|   20|
    |    1|    Col3|   30|
    |    1|    Col4|   40|
    |    2|    Col1|   20|
    |    2|    Col2|   30|
    |    2|    Col3|   50|
    |    2|    Col4|   60|
    |    3|    Col1|   30|
    |    3|    Col2|   50|
    |    3|    Col3|   60|
    |    3|    Col4|   70|
    |    4|    Col1|   30|
    |    4|    Col2|   40|
    |    4|    Col3|   50|
    |    4|    Col4|   80|
    +-----+--------+-----+