My dataframe looks like this
bid_ask.head(6)
Out[3]:
close high low open
timestamp direction
2018-08-16 21:00:00 bid 1.27156 1.27189 1.26899 1.27100
ask 1.27177 1.27270 1.27067 1.27210
volume 1254.00000 1254.00000 1254.00000 1254.00000
2018-08-16 22:00:00 bid 1.27131 1.27154 1.27112 1.27152
ask 1.27152 1.27195 1.27138 1.27173
volume 242.00000 242.00000 242.00000 242.00000
I am trying to perform the following operation on my dataframe to add rows using pd.IndexSlice as an analogue to the case with a simple dataframe index.
bid_ask.loc[pd.IndexSlice[:, 'mid'], :] = 0.5 * bid_ask.loc[pd.IndexSlice[:, ('bid', 'ask')], :].groupby(level="timestamp").sum()
However, I get the following error:
KeyError: 'mid'
It seems that it expects "mid" to be in the dataframe rather than inserting it.
I can achieve the desired result by doing the following
for i in bid_ask.index.get_level_values(0):
bid_ask.ix[(i, "mid"), :] = 0.5 * (bid_ask.ix[(i, "bid"), :] +
bid_ask.ix[(i, "ask"), :])
This takes significantly longer though, especially if my dataset is large and is therefore undesirable.
How can I achieve this with slicing or any other faster alternative?
Using MultiIndex
and concat
new=0.5 * df.loc[pd.IndexSlice[:, ('bid', 'ask')], :].groupby(level="timestamp").sum()
new.index=pd.MultiIndex.from_product([new.index, ['mid']])
pd.concat([df,new]).sort_index()
Out[81]:
close high low open
timestamp direction
2018-08-161 ask 1.271770 1.272700 1.27067 1.272100
bid 1.271560 1.271890 1.26899 1.271000
mid 1.271665 1.272295 1.26983 1.271550
volume 1254.000000 1254.000000 1254.00000 1254.000000
2018-08-162 ask 1.271520 1.271950 1.27138 1.271730
bid 1.271310 1.271540 1.27112 1.271520
mid 1.271415 1.271745 1.27125 1.271625
volume 242.000000 242.000000 242.00000 242.000000