I want to transpose multiple rows of a datafarme in multiple columns, as shown below:
Initial df:
Code | Cat | Count | Value |
---|---|---|---|
S1 | A | 1 | 10 |
S1 | B | 2 | 15 |
S2 | A | 3 | 20 |
S2 | B | 4 | 25 |
Final df:
Code | Count A | Value A | Count B | Value B |
---|---|---|---|---|
S1 | 1 | 10 | 2 | 15 |
S2 | 3 | 20 | 4 | 25 |
Trying the pivot function I can transpose only one between Count and Value
df = df.groupBy("Code").pivot("Cat").agg(f.first("Count"))
Code | A | B |
---|---|---|
S1 | 1 | 2 |
S2 | 3 | 4 |
Making two intermediate dataframes and then join them could be the only way to do it?
There's no need to join, you can do multiple aggregations in one go
result = df.groupBy('Code').pivot('Cat').agg(F.first('Count').alias('Count'),
F.first('Value').alias('Value'))
+----+-------+-------+-------+-------+
|Code|A_Count|A_Value|B_Count|B_Value|
+----+-------+-------+-------+-------+
| S2| 3| 20| 4| 25|
| S1| 1| 10| 2| 15|
+----+-------+-------+-------+-------+