Search code examples
pythonpandasdataframepandas-groupbytranspose

Convert Rows as Column Headers


I have the following dataframe:

---------------------------------------------------------------
| | TORA |  PS  | Hutan Adat | Tahun | Bulan | Dashboard Name |
---------------------------------------------------------------
|0|  0   |   0  |      0     |  2021 |  Jan  |     Potensi    |   
|1|  0   |   0  |      0     |  2021 |  Jan  |     Usulan     |   
|2|  15  |   0  |      3     |  2021 |  Jan  |     Realisasi  |   
|3|  4   |   6  |      0     |  2021 |  Feb  |     Potensi    | 
|4|  0   |   0  |      0     |  2021 |  Feb  |     Usulan     |   
--------------------------------------------------------------- 

I want it this way.

------------------------------------------------------------
            | Potensi | Usulan | Realisasi | Tahun | Bulan |
------------------------------------------------------------
TORA        |   0     |   0    |  15       | 2021  |  Jan  | 
PS          |   0     |   0    |   0       | 2021  |  Jan  | 
Hutan Adat  |   0     |   0    |   3       | 2021  |  Jan  | 
TORA        |   4     |   0    |  NaN      | 2021  |  Feb  | 
PS          |   6     |   0    |  NaN      | 2021  |  Feb  | 
Hutan Adat  |   0     |   0    |  NaN      | 2021  |  Feb  |
------------------------------------------------------------ 

I have tried: #1

df = df.set_index('Dashboard Name').T

#2

df = df.pivot_table(['TORA','PS','Hutan Adat'],['Tahun','Bulan'],'Dashboard Name')

#3

df = df.set_index(['Dashboard Name', 'Tahun', 'Bulan'], drop = True).T

but I didnt get the expected output that I want


Solution

  • First set Bulan columns to ordered Categoricals, for correct sorting:

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep',
              'Oct', 'Nov', 'Dec']
    
    df.Bulan = pd.Categorical(df.Bulan, 
                               ordered=True, 
                               categories=months)
    

    Then reshape by melt with pivoting:

    df = (df.melt(['Tahun','Bulan','Dashboard Name'])
            .pivot(index=['variable','Tahun','Bulan'], 
                         columns='Dashboard Name', 
                         values='value')
            .sort_index(level=[1,2])
            .reset_index(level=[1,2])
            .rename_axis(index=None, columns=None)
            
            )
    print (df)
                Tahun Bulan  Potensi  Realisasi  Usulan
    Hutan Adat   2021   Jan      0.0        3.0     0.0
    PS           2021   Jan      0.0        0.0     0.0
    TORA         2021   Jan      0.0       15.0     0.0
    Hutan Adat   2021   Feb      0.0        NaN     0.0
    PS           2021   Feb      6.0        NaN     0.0
    TORA         2021   Feb      4.0        NaN     0.0
    

    Or:

    df = (df.set_index(['Tahun','Bulan','Dashboard Name'])
            .stack()
            .unstack(level=2)
            .reset_index(level=[0,1])
            .rename_axis(index=None, columns=None)
            
            )
    print (df)
                Tahun Bulan  Potensi  Realisasi  Usulan
    TORA         2021   Jan      0.0       15.0     0.0
    PS           2021   Jan      0.0        0.0     0.0
    Hutan Adat   2021   Jan      0.0        3.0     0.0
    TORA         2021   Feb      4.0        NaN     0.0
    PS           2021   Feb      6.0        NaN     0.0
    Hutan Adat   2021   Feb      0.0        NaN     0.0
    

    Last if order of last columns is important:

    df = df[df.columns[2:].tolist() + df.columns[:2].tolist()]
    print (df)
                Potensi  Realisasi  Usulan  Tahun Bulan
    Hutan Adat      0.0        3.0     0.0   2021   Jan
    PS              0.0        0.0     0.0   2021   Jan
    TORA            0.0       15.0     0.0   2021   Jan
    Hutan Adat      0.0        NaN     0.0   2021   Feb
    PS              6.0        NaN     0.0   2021   Feb
    TORA            4.0        NaN     0.0   2021   Feb