I have the following pandas dataframe and I would like to transpose it.
YEAR | CD_PRD | VL_JAN | VL_FEB | VL_MAR | VL_APR |
---|---|---|---|---|---|
2022 | CD76 | 10000 | 8000 | 170000 | 58000 |
2022 | CD92 | 15000 | 7500 | 86000 | 1200 |
2023 | CD76 | 8000 | 12000 | 15000 | 5000 |
2023 | CD92 | 12000 | 4560 | 10480 | 15600 |
2024 | CD76 | 400 | 8000 | 45800 | 120 |
2024 | CD92 | 56000 | 10200 | 150000 | 1500 |
What I need is to pivot the values columns names e.g.(VL_JAN | VL_FEB | VL_MAR | VL_APR ) into rows, and the rows (CD-PRD) into lines to have the following dataframe at the end:
YEAR | MONTHS | VL_CD76 | VL_CD92 |
---|---|---|---|
2022 | JAN | 10000 | 15000 |
2022 | FEB | 8000 | 7500 |
2022 | MAR | 170000 | 86000 |
2022 | ABR | 58000 | 1200 |
2023 | JAN | 8000 | 12000 |
2023 | FEB | 12000 | 4560 |
2023 | MAR | 15000 | 10480 |
2023 | ABR | 5000 | 15600 |
2024 | JAN | 400 | 56000 |
2024 | FEB | 8000 | 10200 |
2024 | MAR | 45800 | 150000 |
2024 | ABR | 120 | 1500 |
I did the following, but it seems I miss something and did not managed to get the right dataframe structure.
df3 = df2.pivot(index='YEAR', columns=['CD_PRD'], values=['VL_JAN', 'VL_FEB', 'VL_MAR', 'VL_APR']).reset_index()
print(df3)
Any help would be amazing. Thanks!
Marcio
A possible solution, whose steps are:
The first step is to set the index of the dataframe df
to the columns YEAR
and CD_PRD
using the set_index
method.
Then, the stack
method is used to pivot the columns into rows, creating a hierarchical index.
The unstack
method is then applied to pivot the innermost level of the index back into columns.
The reset_index
method is used to reset the index of the dataframe, turning the index back into columns.
The rename_axis
method is used to remove the name of the axis.
The set_axis
method is used to rename the columns of the dataframe to the list colnames
.
Finally, the pipe
method is used to apply a lambda function that uses the assign
method to create a new column MONTH
by removing the prefix VL_
from the values in the MONTH
column using the str.replace
method.
colnames = ['YEAR', 'MONTH', 'VL_CD76', 'VL_CD92']
(df.set_index(['YEAR', 'CD_PRD']).stack()
.unstack(1).reset_index().rename_axis(None, axis=1)
.set_axis(colnames, axis=1)
.pipe(lambda x: x.assign(MONTH = x['MONTH'].str.replace('VL_', ''))))
Output:
YEAR MONTH VL_CD76 VL_CD92
0 2022 JAN 10000 15000
1 2022 FEB 8000 7500
2 2022 MAR 170000 86000
3 2022 APR 58000 1200
4 2023 JAN 8000 12000
5 2023 FEB 12000 4560
6 2023 MAR 15000 10480
7 2023 APR 5000 15600
8 2024 JAN 400 56000
9 2024 FEB 8000 10200
10 2024 MAR 45800 150000
11 2024 APR 120 1500