Search code examples
pythonpandaspivotbooleanquery

Pandas pivoted dataframe and multi-column Boolean comparison


I have a pivoted dataframe of the form

      Price             Units  
Buyer     B     G     S     B   G   S
Idx                                  
1         0  1.51     0     0  11   0
2      2.32  1.32     0    21  13   0
3         0     0  1.44     0   0  14

I am trying to do create another major column called "Flag" with B, G, S sub-columns using the logic that can be thought of as (cell-by-cell)

p['Flag'] = (p['Price'] < 2.0) & (p['Units'] > 13.5)

So the desired result (showing only the new columns)

       Flag
Buyer     B     G     S     
Idx                                  
1     False False False
2     False False False
3     False False  True

I have tried quite a few ways and the following comes closer than others

newp = p.join(((p['Price'] < 2.0) & (p['Units'] > 13.5)).rename(columns=dict(Price='Flag')))

but this has two issues

  1. The boolean output is incorrect for the bottom right corner. It should be true since the corresponding cell price is less than 2.0 and the corresponding cell units is more than 13.5.
  2. It gives the warning "UserWarning: merging between different levels can give an unintended result (2 levels on the left, 1 on the right)". I can't seem to get the major column name "Flag" into the dataframe.

Any ideas on fixing the Boolean conditions and merging at the correct level?

The code for generating the initial dataframe is

from collections import OrderedDict
import pandas as pd

table = OrderedDict((
    ("Idx", [1, 2, 2, 3]),
    ('Buyer',['G', 'B', 'G', 'S']),
    ('Price',  ['1.51', '2.32', '1.32', '1.44']),
    ('Units',   ['11', '21', '13', '14'])
))
d = pd.DataFrame(table)
p = d.pivot(index='Idx', columns='Buyer')
p.fillna(0, inplace=True)

Solution

  • I think you need convert string numbers to float by astype and then use concat:

    p = p.astype(float)
    
    newp = pd.concat([p['Price'], p['Units'], (p['Price'] < 2.0) & (p['Units'] > 13.5)], 
                     axis=1, 
                     keys=['Price','Units','Flag'])
    print (newp)
    
          Price             Units               Flag              
    Buyer     B     G     S     B     G     S      B      G      S
    Idx                                                           
    1      0.00  1.51  0.00   0.0  11.0   0.0  False  False  False
    2      2.32  1.32  0.00  21.0  13.0   0.0  False  False  False
    3      0.00  0.00  1.44   0.0   0.0  14.0  False  False   True
    

    Solution with join and MultiIndex.from_product for create new level:

    p = p.astype(float)
    
    a = (p['Price'] < 2.0) & (p['Units'] > 13.5)
    a.columns = pd.MultiIndex.from_product([['Flag'],a.columns])
    p = p.join(a)
    print (p)
          Price             Units               Flag              
    Buyer     B     G     S     B     G     S      B      G      S
    Idx                                                           
    1      0.00  1.51  0.00   0.0  11.0   0.0  False  False  False
    2      2.32  1.32  0.00  21.0  13.0   0.0  False  False  False
    3      0.00  0.00  1.44   0.0   0.0  14.0  False  False   True