Search code examples
pythonpandaspandas-melt

How to unpivot dataset in Python


I have a dataset which I need to unpivot into multiple rows into columns.

Eg:

ID  Currency    Val1    Val2        Month
101 INR     57007037.32 1292025.24  2021-03
101 INR     49171143.9  1303785.98  2021-02
101 INR     54039073.81 1469727.23  2021-01
101 INR     67733998.9  1370086.78  2020-12
101 INR     48838409.39 1203648.32  2020-11
101 INR     43119693.71 0           2020-10

I need to pivot / unpivot them as below.

ID  Currency    Keys        2021-03     2021-02         2021-01         2020-12     2020-11     2020-10
101 INR         Val1    57007037.32     49171143.9      54039073.81     67733998.9  48838409.39     4311.71
101 INR         Val2    1292025.24      1303785.98      1469727.23      1370086.78  1203648.32      0

I have tried df.melt but that is not the giving exact output.

df = pd.read_csv('/path/abc.csv')
print(df.melt(id_vars=['id'], var_name=['month'], value_vars=['val1','val2']))

Please help on this.


Solution

  • Another version using .pivot:

    print(
        pd.pivot(
            df,
            index=["ID", "Currency"],
            columns="Month",
            values=["Val1", "Val2"],
        )
        .stack(level=0)
        .reset_index()
        .rename(columns={"level_2": "keys"})
    )
    

    Prints:

    Month   ID Currency  keys      2020-10      2020-11      2020-12      2021-01      2021-02      2021-03
    0      101      INR  Val1  43119693.71  48838409.39  67733998.90  54039073.81  49171143.90  57007037.32
    1      101      INR  Val2         0.00   1203648.32   1370086.78   1469727.23   1303785.98   1292025.24
    

    EDIT: Solution that works in pandas=0.22.0:

    print(
        df.set_index(["ID", "Currency"])
        .pivot(columns="Month")
        .stack(level=0)
        .reset_index()
        .rename(columns={"level_2": "keys"})
    )
    

    Prints:

    Month   ID Currency  keys      2020-10      2020-11      2020-12      2021-01      2021-02      2021-03
    0      101      INR  Val1  43119693.71  48838409.39  67733998.90  54039073.81  49171143.90  57007037.32
    1      101      INR  Val2         0.00   1203648.32   1370086.78   1469727.23   1303785.98   1292025.24
    

    EDIT: To sort columns:

    x = (
        df.set_index(["ID", "Currency"])
        .pivot(columns="Month")
        .stack(level=0)
        .reset_index()
        .rename(columns={"level_2": "keys"})
    )
    x.columns.name = ""
    x = x[["ID", "Currency", "keys"] + sorted(x.columns, reverse=True)[3:]]
    print(x)
    

    Prints:

        ID Currency  keys      2021-03      2021-02      2021-01      2020-12      2020-11      2020-10
    0  101      INR  Val1  57007037.32  49171143.90  54039073.81  67733998.90  48838409.39  43119693.71
    1  101      INR  Val2   1292025.24   1303785.98   1469727.23   1370086.78   1203648.32         0.00