I need to pivot the Table with custom column name's. Please see the table format below.
+-------+------+------+--------+
|balance| ex_cy|rp_prd|scenario|
+-------+------+------+--------+
| 100|2022Q2|202204| abc|
| 200|2022Q2|202204| abc|
| 300|2022Q2|202204| abc|
+-------+------+------+--------+
I want to convert this into below format:
+------+------+--------+---+---+---+
| ex_cy|rp_prd|scenario| C1| C2| C3|
+------+------+--------+---+---+---+
|2022Q2|202204| abc|100|200|300|
+------+------+--------+---+---+---+
Can anyone help on this please??
Lets create an additional id
column to uniquely identify rows per 'ex_cy', 'rp_prd' and 'scenario', then do a groupby
+ pivot
and aggregate balance
with first
cols = ['ex_cy', 'rp_prd', 'scenario']
W = Window.partitionBy(*cols).orderBy('balance')
id_ = F.concat(F.lit('C'), F.row_number().over(W))
result = (
df
.withColumn('id', id_)
.groupBy(*cols).pivot('id').agg(F.first('balance'))
)
Result
+------+------+--------+---+---+---+
| ex_cy|rp_prd|scenario| C1| C2| C3|
+------+------+--------+---+---+---+
|2022Q2|202204| abc|100|200|300|
+------+------+--------+---+---+---+