Search code examples
pandasdataframeconditional-statementssample

Keep rows of dataframe if multiple conditions met


Given the following dataframe:

df = pd.DataFrame({'A': ["EQ", "CB", "CB", "FF", "EQ", "EQ", "CB", "CB"],
                   'B': ["ANT", "ANT", "DQ", "DQ", "BQ", "VGQ", "GHB", "VGQ"]})

How can I keep the rows of column B if it meets the condition of exist for both EQ and CB. For example, I would want to keep ANT because it exists for both EQ and CB, while DQ would be deleted. So the expected output for the df would be:

out = pd.DataFrame({'A': ["EQ", "CB", "EQ", "CB"],
                       'B': ["ANT", "ANT", "VGQ", "VGQ"]})

Thanks!


Solution

  • Here is a solution not using groupby() if you want code that may be easier to think about:

    equities = df.B[df.A == 'EQ']
    bonds = df.B[df.A == 'CB']
    both = equities[equities.isin(bonds)]
    

    That gives you:

    0    ANT
    5    VGQ
    

    Which makes the last part easy:

    df[df.B.isin(both)]
    Out: 
        A    B
    0  EQ  ANT
    1  CB  ANT
    5  EQ  VGQ
    7  CB  VGQ
    

    This is 3x faster on small data sets than groupby().filter().