Search code examples
pythonfor-loopgroup-bycategoriesexploratory-data-analysis

Select or drop categories based on condition


I have this example dataset:

enter image description here

What I'm trying to do is to see which categories on ID column have values that are strictly higher than 45, while showing me the others that aren't. So it should tell me that IDs 'a' and 'd' match my criteria, while 'b' and 'c' are out of it. Afterwards, I'll drop the rows 'b' and 'c'

What's the simplest way of doing that?

I tried

def filter_func(x):
     return x['vals']>45

df.groupby('id').filter(filter_func)
df['id'].unique()

but I get this error:
filter function returned a Series, but expected a scalar bool

Solution

  • You can try this way :

    df2 = df.groupby('id').min().reset_index()
    df2.loc[df2['vals'] > 45]['id']