Suppose I create a multiindex dataframe like this:
import pandas as pd
date_index = pd.date_range(start='1/1/2019', end='1/3/2019')
symbol_index = ['AAPL','BOA','GE','MSFT']
idx = pd.MultiIndex.from_product([date_index, symbol_index], names=['Date', 'Symbol'])
col = ['val_to_change','new_val']
data = pd.DataFrame(index=idx,columns=col)
I want to make some calculations row by row like this:
initial_variable = 0
def calc(row):
global initial_variable
row['val_to_change'] = initial_variable
#I make further calculations here with row['val_to_change']
row['new_val'] = row['val_to_change'] + 500
initial_variable = row['new_val'].values
return row
data = data.groupby('Date').apply(calc)
This gives me the desired output:
val_to_change new_val
Date Symbol
2019-01-01 AAPL 0 500
BOA 0 500
GE 0 500
MSFT 0 500
2019-01-02 AAPL 500 1000
BOA 500 1000
GE 500 1000
MSFT 500 1000
2019-01-03 AAPL 1000 1500
BOA 1000 1500
GE 1000 1500
MSFT 1000 1500
However, if the dataframe does not have the same amount of Symbol
level index for each Date
index, then I get an error for the above function. If I drop a Symbol
from a Date
:
data.drop(('2019-01-01','AAPL'),inplace=True)
And now apply the same function to data
, then I get an error:
ValueError: Length of values does not match length of index
because the first Date
index has now only 3 Symbol
indices. How can I overcome this problem? I want to apply a function like this on my dataframe, which has a multiindex but the multiindex does not have the same amount of Symbol
index for every Date
index.
Use DataFrame.unstack
with DataFrame.stack
and parameter dropna=False
for avoid drop new missing rows filled by NaN
s:
data.drop(('2019-01-01','AAPL'),inplace=True)
data = data.unstack().stack(dropna=False)
data = data.groupby('Date').apply(calc)
print (data)
val_to_change new_val
Date Symbol
2019-01-01 AAPL 0 500
BOA 0 500
GE 0 500
MSFT 0 500
2019-01-02 AAPL 500 1000
BOA 500 1000
GE 500 1000
MSFT 500 1000
2019-01-03 AAPL 1000 1500
BOA 1000 1500
GE 1000 1500
MSFT 1000 1500