Given the following dataframe:
df = pd.DataFrame({'A': ["EQ", "CB", "CB", "FF", "EQ", "EQ", "CB", "CB"],
'B': ["ANT", "ANT", "DQ", "DQ", "BQ", "VGQ", "GHB", "VGQ"]})
How can I keep the rows of column B
if it meets the condition of exist for both EQ and CB
. For example, I would want to keep ANT
because it exists for both EQ
and CB
, while DQ
would be deleted. So the expected output for the df
would be:
out = pd.DataFrame({'A': ["EQ", "CB", "EQ", "CB"],
'B': ["ANT", "ANT", "VGQ", "VGQ"]})
Thanks!
Here is a solution not using groupby() if you want code that may be easier to think about:
equities = df.B[df.A == 'EQ']
bonds = df.B[df.A == 'CB']
both = equities[equities.isin(bonds)]
That gives you:
0 ANT
5 VGQ
Which makes the last part easy:
df[df.B.isin(both)]
Out:
A B
0 EQ ANT
1 CB ANT
5 EQ VGQ
7 CB VGQ
This is 3x faster on small data sets than groupby().filter().