I have a data frame that looks like this that includes price
side
and volume
parameters from multiple exchanges.
df = pd.DataFrame({
'price_ex1' : [9380.59650, 9394.85206, 9397.80000],
'side_ex1' : ['bid', 'bid', 'ask'],
'size_ex1' : [0.416, 0.053, 0.023],
'price_ex2' : [9437.24045, 9487.81185, 9497.81424],
'side_ex2' : ['bid', 'bid', 'ask'],
'size_ex2' : [10.0, 556.0, 23.0]
})
df
price_ex1 side_ex1 size_ex1 price_ex2 side_ex2 size_ex2
0 9380.59650 bid 0.416 9437.24045 bid 10.0
1 9394.85206 bid 0.053 9487.81185 bid 556.0
2 9397.80000 ask 0.023 9497.81424 ask 23.0
For each exchange (I have more than two exchanges), I want the index to be the union of all prices from all exchanges (i.e. union of price_ex1
, price_ex2
, etc...) ranked from highest to lowest. Then I want to create two size
columns for each exchange based on the side
parameter of that exchange. The output should look like this where empty columns are NaN
.
I am not sure what is the best pandas function to do this, whether it is pivot or melt and how to use that function when I have more than 1 binary column I am flattening.
Thank you for your help!
This is a three step process. After you correct your multiindexed columns, you should stack your dataset, then pivot it.
First, clean up the multiindex columns so that you more easily transform:
df.columns = pd.MultiIndex.from_product([['1', '2'], [col[:-4] for col in df.columns[:3]]], names=['exchange', 'params'])
exchange 1 2
params price side size price side size
0 9380.59650 bid 0.416 9437.24045 bid 10.0
1 9394.85206 bid 0.053 9487.81185 bid 556.0
2 9397.80000 ask 0.023 9497.81424 ask 23.0
Then stack and append the exchange num to the bid
and ask
values:
df = df.swaplevel(axis=1).stack()
df['side'] = df.apply(lambda row: row.side + '_ex' + row.name[1], axis=1)
params price side size
exchange
0 1 9380.59650 bid_ex1 0.416
2 9437.24045 bid_ex2 10.000
1 1 9394.85206 bid_ex1 0.053
2 9487.81185 bid_ex2 556.000
2 1 9397.80000 ask_ex1 0.023
2 9497.81424 ask_ex2 23.000
Finally, pivot and sort by price:
df.pivot_table(index=['price'], values=['size'], columns=['side']).sort_values('price', ascending=False)
params size
side ask_ex1 ask_ex2 bid_ex1 bid_ex2
price
9497.81424 NaN 23.0 NaN NaN
9487.81185 NaN NaN NaN 556.0
9437.24045 NaN NaN NaN 10.0
9397.80000 0.023 NaN NaN NaN
9394.85206 NaN NaN 0.053 NaN
9380.59650 NaN NaN 0.416 NaN