dataframeapache-sparkpysparkpivottranspose

Pyspark Transpose multiple rows in multiple columns


I want to transpose multiple rows of a datafarme in multiple columns, as shown below:

Initial df:

Code Cat Count Value
S1 A 1 10
S1 B 2 15
S2 A 3 20
S2 B 4 25

Final df:

Code Count A Value A Count B Value B
S1 1 10 2 15
S2 3 20 4 25

Trying the pivot function I can transpose only one between Count and Value

df = df.groupBy("Code").pivot("Cat").agg(f.first("Count"))

Code A B
S1 1 2
S2 3 4

Making two intermediate dataframes and then join them could be the only way to do it?


Solution

  • There's no need to join, you can do multiple aggregations in one go

    result = df.groupBy('Code').pivot('Cat').agg(F.first('Count').alias('Count'), 
                                                 F.first('Value').alias('Value'))
    

    +----+-------+-------+-------+-------+
    |Code|A_Count|A_Value|B_Count|B_Value|
    +----+-------+-------+-------+-------+
    |  S2|      3|     20|      4|     25|
    |  S1|      1|     10|      2|     15|
    +----+-------+-------+-------+-------+