Search code examples
pandasmulti-index

pandas MultiIndex assign multiple columns


I have created a dataframe with a MultiIndex like below:

import numpy as np
import pandas as pd
column_index= [np.array(['OPEN','OPEN','CLOSE','CLOSE']),np.array(['IBM','AAPL','IBM','AAPL'])]
df = pd.DataFrame(np.transpose(np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])),index=['20190101','20190102','20190103'],columns=column_index)

The result is like this:

          OPEN          CLOSE
          IBM   AAPL    IBM AAPL
20190101    1    4      7   10
20190102    2    5      8   11
20190103    3    6      9   12

Now I'd like to create a new set of columns by doing something like:

df['RTN'] = df.CLOSE / df.OPEN

To get:

            OPEN      CLOSE       RTN
            IBM AAPL  IBM   AAPL  IBM   AAPL
20190101    1   4     7     10    7.0   2.5
20190102    2   5     8     11    4.0   2.2
20190103    3   6     9     12    3.0   2.0

That does not work. The nicest way I've been able to do this is like so:

rtn = df.CLOSE / df.OPEN
rtn = pd.concat([rtn],keys=['RTN'],axis=1)
df = pd.concat([df,rtn],axis=1)

Is there a way to do this as an assignment without the other steps?


Solution

  • One way is to rename the columns prior to the operations. Then it's a simple concat:

    u = df.loc[:, ['CLOSE']].rename(columns={'CLOSE': 'RTN'}, level=0).divide(
            df.loc[:, ['OPEN']].rename(columns={'OPEN': 'RTN'}, level=0))
                       # [] DataFrame selection keeps MultiIndex
    
    pd.concat([df, u], axis=1)
    

    Alternatively, you can stack + eval + unstack. It's concise, but perhaps not super performant for large datasets.

    df.stack().eval('RTN = CLOSE/OPEN').unstack()
    #df.stack().assign(RTN = lambda x: x.CLOSE/x.OPEN).unstack()
    

    Without eval:

    df.stack().assign(RTN = lambda x: x.CLOSE/x.OPEN).unstack()
    
    #or 
    
    df = df.stack()
    df['RTN'] = df.CLOSE/df.OPEN
    df = df.unstack()
    

    Output in all cases:

             OPEN      CLOSE       RTN     
              IBM AAPL   IBM AAPL  IBM AAPL
    20190101    1    4     7   10  7.0  2.5
    20190102    2    5     8   11  4.0  2.2
    20190103    3    6     9   12  3.0  2.0