Search code examples
pandaspivotmelt

Pandas melt data based on two or more binary columns


I have a data frame that looks like this that includes price side and volume parameters from multiple exchanges.

df = pd.DataFrame({
    'price_ex1' : [9380.59650, 9394.85206, 9397.80000],
    'side_ex1' : ['bid', 'bid', 'ask'],
    'size_ex1' : [0.416, 0.053, 0.023],
    'price_ex2' : [9437.24045, 9487.81185, 9497.81424],
    'side_ex2' : ['bid', 'bid', 'ask'],
    'size_ex2' : [10.0, 556.0, 23.0]
})

df


       price_ex1     side_ex1      size_ex1  price_ex2   side_ex2    size_ex2
0     9380.59650          bid         0.416  9437.24045       bid        10.0
1     9394.85206          bid         0.053  9487.81185       bid       556.0
2     9397.80000          ask         0.023  9497.81424       ask        23.0

For each exchange (I have more than two exchanges), I want the index to be the union of all prices from all exchanges (i.e. union of price_ex1, price_ex2, etc...) ranked from highest to lowest. Then I want to create two size columns for each exchange based on the side parameter of that exchange. The output should look like this where empty columns are NaN.

I am not sure what is the best pandas function to do this, whether it is pivot or melt and how to use that function when I have more than 1 binary column I am flattening.

Thank you for your help!

enter image description here


Solution

  • This is a three step process. After you correct your multiindexed columns, you should stack your dataset, then pivot it.

    First, clean up the multiindex columns so that you more easily transform:

    df.columns = pd.MultiIndex.from_product([['1', '2'], [col[:-4] for col in df.columns[:3]]], names=['exchange', 'params'])
    
    exchange           1                       2            
    params         price side   size       price side   size
    0         9380.59650  bid  0.416  9437.24045  bid   10.0
    1         9394.85206  bid  0.053  9487.81185  bid  556.0
    2         9397.80000  ask  0.023  9497.81424  ask   23.0
    

    Then stack and append the exchange num to the bid and ask values:

    df = df.swaplevel(axis=1).stack()
    df['side'] = df.apply(lambda row: row.side + '_ex' + row.name[1], axis=1)       
    
    params           price     side     size
      exchange                              
    0 1         9380.59650  bid_ex1    0.416
      2         9437.24045  bid_ex2   10.000
    1 1         9394.85206  bid_ex1    0.053
      2         9487.81185  bid_ex2  556.000
    2 1         9397.80000  ask_ex1    0.023
      2         9497.81424  ask_ex2   23.000
    

    Finally, pivot and sort by price:

    df.pivot_table(index=['price'], values=['size'], columns=['side']).sort_values('price', ascending=False) 
    
    params        size                        
    side       ask_ex1 ask_ex2 bid_ex1 bid_ex2
    price                                     
    9497.81424     NaN    23.0     NaN     NaN
    9487.81185     NaN     NaN     NaN   556.0
    9437.24045     NaN     NaN     NaN    10.0
    9397.80000   0.023     NaN     NaN     NaN
    9394.85206     NaN     NaN   0.053     NaN
    9380.59650     NaN     NaN   0.416     NaN