Search code examples
pandasdataframegroup-by

Find mode for each group in pandas DataFrame


I have a dataframe with the following 3 columns: property, value, count

Pairs of (property, value) are unique.

I need to add a new column to this dataframe, that will check if the value in count is the most frequent within its property column value. So, for example:

We have a dataframe

property,value,count
p1,v1,20
p1,v2,50
p1,v3,50
p2,v4,10
p2,v5,20

And as a result we need to get:

property,value,count,is_mode
p1,v1,20,False
p1,v2,50,True
p1,v3,50,True
p2,v4,10,True
p2,v5,20,False 
  • in case when there are multiple modes I need to use the minimum one for calculating the is_mode value. So here, for p2 there are modes - 10, 20, but since I need the minimum one, for 10 is_mode will be True and for 20 it will be False

I've found numerous solutions from stackoverflow already, but these didn't lead me to any good result. For some reason

I tried to split the task into first getting the mode for each property and then somehow compare the values to the minimum mode for this group:

new_df = df.groupby(["property"])["count"].agg(pd.Series.mode)

same if I try to attach it to the current dataframe:

properties["mode"] = properties.groupby(["property"])["count"].transform(pd.Series.mode)

these doesn't crush however they produce weird results with lots of Nans. For example, I have:

property,value,count
p1,v1,200
p1,v2,60
p1,v3,60

and expect to get:

property,value,count,mode
p1,v1,200,60
p1,v2,60,60
p1,v3,60,60

but in my case I get:

property,value,count,mode
p1,v1,200,NaN
p1,v2,60,NaN
p1,v3,60,NaN

Solution

  • Assuming you want the smallest mode of the counts per group, use a custom function and groupby.transform.

    For each group, get the mode (one or more values), then the min, flag values that are equal to it:

    df['is_mode'] = (df.groupby('property')['count']
                       .transform(lambda s: s.eq(s.mode().min()))
                    )
    

    Output:

      property value  count  is_mode
    0       p1    v1     20    False
    1       p1    v2     50     True
    2       p1    v3     50     True
    3       p2    v4     10     True
    4       p2    v5     20    False