Search code examples
pandasfunctionapplymulti-index

How to make length of values and index equal when applying a function to a multiindex dataframe?


Suppose I create a multiindex dataframe like this:

import pandas as pd

date_index = pd.date_range(start='1/1/2019', end='1/3/2019')
symbol_index = ['AAPL','BOA','GE','MSFT']
idx = pd.MultiIndex.from_product([date_index, symbol_index], names=['Date', 'Symbol'])
col = ['val_to_change','new_val']
data = pd.DataFrame(index=idx,columns=col)

I want to make some calculations row by row like this:

initial_variable = 0

def calc(row):
    global initial_variable
    row['val_to_change'] = initial_variable
    #I make further calculations here with row['val_to_change']
    row['new_val'] = row['val_to_change'] + 500
    initial_variable = row['new_val'].values
    return row

data = data.groupby('Date').apply(calc)

This gives me the desired output:

                   val_to_change  new_val
Date       Symbol                        
2019-01-01 AAPL                0      500
           BOA                 0      500
           GE                  0      500
           MSFT                0      500
2019-01-02 AAPL              500     1000
           BOA               500     1000
           GE                500     1000
           MSFT              500     1000
2019-01-03 AAPL             1000     1500
           BOA              1000     1500
           GE               1000     1500
           MSFT             1000     1500

However, if the dataframe does not have the same amount of Symbol level index for each Date index, then I get an error for the above function. If I drop a Symbol from a Date:

data.drop(('2019-01-01','AAPL'),inplace=True)

And now apply the same function to data, then I get an error:

ValueError: Length of values does not match length of index

because the first Date index has now only 3 Symbol indices. How can I overcome this problem? I want to apply a function like this on my dataframe, which has a multiindex but the multiindex does not have the same amount of Symbol index for every Date index.


Solution

  • Use DataFrame.unstack with DataFrame.stack and parameter dropna=False for avoid drop new missing rows filled by NaNs:

    data.drop(('2019-01-01','AAPL'),inplace=True)
    
    data = data.unstack().stack(dropna=False)
    data = data.groupby('Date').apply(calc)
    print (data)
                       val_to_change  new_val
    Date       Symbol                        
    2019-01-01 AAPL                0      500
               BOA                 0      500
               GE                  0      500
               MSFT                0      500
    2019-01-02 AAPL              500     1000
               BOA               500     1000
               GE                500     1000
               MSFT              500     1000
    2019-01-03 AAPL             1000     1500
               BOA              1000     1500
               GE               1000     1500
               MSFT             1000     1500