Search code examples
pythonpandasgroup-byfiltering

Filtering data based on boolean columns in python


I have the following pandas dataframe and I would like a function that returns the ID's data with at least 1 True value in bool_1, 2 True values in bool_2 and 3 True values in bool_3 column, using the groupby function.

index  ID  bool_1  bool_2  bool_3
 0      7   True    True    True    
 1      7   False   True    True
 2      7   False   False   True
 3      8   True    True    True
 4      8   True    True    True    
 5      8   False   False   True    
 6      9   True    True    True    
 7      9   True    False   True    
 8      9   True    False   True    
 9      9   True    False   False   

As output I would expect complete data for ID 7 and 8 to be returned, since 9 has only 1 True value for bool_2. Any idea for that function? Thank you!


Solution

  • You can specify number of Trues in dictionary, so possible compare by DataFrame.ge for greate or equal count of Trues by aggregate sum and filter original DataFrame by boolean indexing with Series.isin:

    d = {'bool_1':1, 'bool_2':2,'bool_3':3}
    
    ids = df.groupby('ID')[list(d.keys())].sum().ge(d).all(axis=1)
    print (ids)
    ID
    7     True
    8     True
    9    False
    dtype: bool
    
    out = df[df['ID'].isin(ids.index[ids])]
    print (out)
       index  ID  bool_1  bool_2  bool_3
    0      0   7    True    True    True
    1      1   7   False    True    True
    2      2   7   False   False    True
    3      3   8    True    True    True
    4      4   8    True    True    True
    5      5   8   False   False    True
    

    Another idea is use GroupBy.transform for create boolean mask:

    d = {'bool_1':1, 'bool_2':2,'bool_3':3}
    
    mask = df.groupby('ID')[list(d.keys())].transform('sum').ge(d).all(axis=1)
    print (mask)
    0     True
    1     True
    2     True
    3     True
    4     True
    5     True
    6    False
    7    False
    8    False
    9    False
    dtype: bool
    
    out = df[mask]
    print (out)
       index  ID  bool_1  bool_2  bool_3
    0      0   7    True    True    True
    1      1   7   False    True    True
    2      2   7   False   False    True
    3      3   8    True    True    True
    4      4   8    True    True    True
    5      5   8   False   False    True