Search code examples
pandasdataframetranspose

Transposing multiple values columns in Pandas


I have the following pandas dataframe and I would like to transpose it.

YEAR CD_PRD VL_JAN VL_FEB VL_MAR VL_APR
2022 CD76 10000 8000 170000 58000
2022 CD92 15000 7500 86000 1200
2023 CD76 8000 12000 15000 5000
2023 CD92 12000 4560 10480 15600
2024 CD76 400 8000 45800 120
2024 CD92 56000 10200 150000 1500

What I need is to pivot the values columns names e.g.(VL_JAN | VL_FEB | VL_MAR | VL_APR ) into rows, and the rows (CD-PRD) into lines to have the following dataframe at the end:

YEAR MONTHS VL_CD76 VL_CD92
2022 JAN 10000 15000
2022 FEB 8000 7500
2022 MAR 170000 86000
2022 ABR 58000 1200
2023 JAN 8000 12000
2023 FEB 12000 4560
2023 MAR 15000 10480
2023 ABR 5000 15600
2024 JAN 400 56000
2024 FEB 8000 10200
2024 MAR 45800 150000
2024 ABR 120 1500

I did the following, but it seems I miss something and did not managed to get the right dataframe structure.

df3 = df2.pivot(index='YEAR', columns=['CD_PRD'], values=['VL_JAN', 'VL_FEB', 'VL_MAR', 'VL_APR']).reset_index()
print(df3)

Any help would be amazing. Thanks!

Marcio


Solution

  • A possible solution, whose steps are:

    • The first step is to set the index of the dataframe df to the columns YEAR and CD_PRD using the set_index method.

    • Then, the stack method is used to pivot the columns into rows, creating a hierarchical index.

    • The unstack method is then applied to pivot the innermost level of the index back into columns.

    • The reset_index method is used to reset the index of the dataframe, turning the index back into columns.

    • The rename_axis method is used to remove the name of the axis.

    • The set_axis method is used to rename the columns of the dataframe to the list colnames.

    • Finally, the pipe method is used to apply a lambda function that uses the assign method to create a new column MONTH by removing the prefix VL_ from the values in the MONTH column using the str.replace method.

    colnames = ['YEAR', 'MONTH', 'VL_CD76', 'VL_CD92']
    
    (df.set_index(['YEAR', 'CD_PRD']).stack()
     .unstack(1).reset_index().rename_axis(None, axis=1)
     .set_axis(colnames, axis=1)
     .pipe(lambda x: x.assign(MONTH = x['MONTH'].str.replace('VL_', ''))))
    

    Output:

        YEAR MONTH  VL_CD76  VL_CD92
    0   2022   JAN    10000    15000
    1   2022   FEB     8000     7500
    2   2022   MAR   170000    86000
    3   2022   APR    58000     1200
    4   2023   JAN     8000    12000
    5   2023   FEB    12000     4560
    6   2023   MAR    15000    10480
    7   2023   APR     5000    15600
    8   2024   JAN      400    56000
    9   2024   FEB     8000    10200
    10  2024   MAR    45800   150000
    11  2024   APR      120     1500