Search code examples
pythonpandasdataframegroup-by

Filter Groupby if contains specific values


Say I have the following dataframe covering hundreds of different names:

df = pd.DataFrame({"NAME":["A", "A", "A" ,"A", 
                       "B", "B", "B", 
                       "C", "C", "C", "C",
                       "D", "D",],
               "CLASS":["LI","BO","EQ","AI",
                           "LI", "EQ", "AI", 
                           "LI","BO","EQ","AI",
                           "EQ", "AI", 
                          ]})

Using the groupby-function of Pandas, how am I supposed to filter out only those names that contain both EQ AND AI but NOT BO? Ideally, I should obtain the following:

NAME CLASS
B    LI
B    EQ
B    AI
D    EQ
D    AI

Any suggestions?


Solution

  • Use groupby.transform with set operations:

    out = df[df.groupby('NAME')['CLASS']
               .transform(lambda x: ((S:=set(x))>={'EQ', 'AI'}) and ('BO' not in S))]
    # or
    # out = df[df.groupby('NAME')['CLASS']
    #            .transform(lambda x: ((S:=set(x))>={'EQ', 'AI'})
    #                       and not S.intersection({'BO'}))]
    

    Variant with groupby.agg and isin:

    g = df.groupby('NAME')['CLASS'].agg(set)
    out = df[df['NAME'].isin(g.index[(g >= {'EQ', 'AI'}) & ~(g >= {'BO'})])]
    

    Output:

       NAME CLASS
    4     B    LI
    5     B    EQ
    6     B    AI
    11    D    EQ
    12    D    AI
    
    Intermediates

    with an extra group for completeness

    # transform approach
       NAME CLASS  g >= {'EQ', 'AI'}  ~(g >= {'BO'})      &
    0     A    LI               True           False  False
    1     A    BO               True           False  False
    2     A    EQ               True           False  False
    3     A    AI               True           False  False
    4     B    LI               True            True   True
    5     B    EQ               True            True   True
    6     B    AI               True            True   True
    7     C    LI               True           False  False
    8     C    BO               True           False  False
    9     C    EQ               True           False  False
    10    C    AI               True           False  False
    11    D    EQ               True            True   True
    12    D    AI               True            True   True
    13    E    AI              False            True  False
    
    # agg approach
                         g  (g >= {'EQ', 'AI'})  ~(g >= {'BO'})      &
    NAME                                                              
    A     {LI, EQ, BO, AI}                 True           False  False
    B         {LI, EQ, AI}                 True            True   True
    C     {LI, EQ, BO, AI}                 True           False  False
    D             {EQ, AI}                 True            True   True
    E                 {AI}                False            True  False