Search code examples
pythonpandasmulti-index

Pandas multi index


I currently have a pandas multi index of this structure(with ticker and field being the indices):

                                             value
ticker            field                       
DE0001141174 Govt CASH_FLOW_DATE    2000-11-21
                  CASH_FLOW_AMOUNT       51250
                  PRINCIPAL_AMOUNT       1e+06
DE0001141232 Govt CASH_FLOW_DATE    2000-05-17
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2001-05-17
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2002-05-17
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT       1e+06
DE0001141380 Govt CASH_FLOW_DATE    2002-08-18
                  CASH_FLOW_AMOUNT     67808.2
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2003-08-18
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2004-08-18
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2005-08-18
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT           0
                  CASH_FLOW_DATE    2006-08-18
                  CASH_FLOW_AMOUNT       45000
                  PRINCIPAL_AMOUNT       1e+06

And i want it to be transformed into a structure like this with the ticker and CASH_FLOW_DATE being the indices:

ticker              CASH_FLOW_DATE  CASH_FLOW_AMOUNT    PRINCIPAL_AMOUNT          
DE0001141174 Govt   2000-11-21      51250               1e+06
DE0001141232 Govt   2000-05-17      45000               0   
                    2001-05-17      45000               0
                    2002-05-17      45000               1e+06
DE0001141380 Govt   2002-08-18      67808.2             0
                    2003-08-18      45000               0
                    2004-08-18      45000               0
                    2005-08-18      45000               0
                    2006-08-18      45000               0

I guess the problem is that python/pandas has no way to naturally recognize that the two rows below 'CASH_FLOW_DATE' are related to that value. I guess I could do it with a lot of ugly loops, but I was wondering if there is a more pythonic way of doing this.


Solution

  • You need cumcount for new level of index which is appended by set_index to original index and then call unstack:

    df = df.set_index(df.groupby(level=[0,1]).cumcount(), append=True)
    df = df['value'].unstack(level=1, fill_value=0).reset_index(level=1, drop=True).reset_index()
    print (df)
    field             ticker CASH_FLOW_AMOUNT CASH_FLOW_DATE PRINCIPAL_AMOUNT
    0      DE0001141174 Govt            51250     2000-11-21            1e+06
    1      DE0001141232 Govt            45000     2000-05-17                0
    2      DE0001141232 Govt            45000     2001-05-17                0
    3      DE0001141232 Govt            45000     2002-05-17            1e+06
    4      DE0001141380 Govt          67808.2     2002-08-18                0
    5      DE0001141380 Govt            45000     2003-08-18                0
    6      DE0001141380 Govt            45000     2004-08-18                0
    7      DE0001141380 Govt            45000     2005-08-18                0
    8      DE0001141380 Govt            45000     2006-08-18            1e+06