Search code examples
pythonpandasdataframegroup-bymax

How to get max of counts for groupby (most frequent items)


I have a dataframe. I want to group by rows on some columns and then form a count column and then get the max of counts and create a column for it and attach it to dataframe.

I tried:

    df["max_pred"] = df.groupby(['fid','prefix','pred_text1'], 
                             sort=False)["pred_text1"].transform("max")

However it lists a row with max repeat for pred_text1, but I want the number of reparation for it

For example:

A  B  C
a  d  b
a  d  b
a  d  b
a  d  a
a  d  a
b  b  c
b  b  c
b  b  d

If I group the rows by A and B and then count C and get its max for each group and store that in new column F, I expect:

A  B  F   E
a  d  3   b
a  d  3   b
a  d  3   b
a  d  3   b
a  d  3   b
b  b  2   c
b  b  2   c
b  b  2   c

E shows the most frequent item whose frequency was specified in F


Solution

  • You can use groupby.transform with value_counts:

    df['F'] = (df.groupby(['A', 'B'])['C']
                 .transform(lambda g: g.value_counts(sort=False).max())
              )
    

    Variant with collections.Counter:

    from collections import Counter
    
    df['F'] = (df.groupby(['A', 'B'])['C']
                 .transform(lambda g: max(Counter(g).values()))
              )
    

    Output:

       A  B  C  F
    0  a  d  b  3
    1  a  d  b  3
    2  a  d  b  3
    3  a  d  a  3
    4  a  d  a  3
    5  b  b  c  2
    6  b  b  c  2
    7  b  b  d  2
    

    update

    I would use a merge here:

    cols = ['A', 'B']
    out = df.merge(df[cols+['C']]
                     .value_counts().groupby(cols).head(1)
                     .reset_index(name='F').rename(columns={'C': 'E'})
                   )
    

    Output:

       A  B  C  E  F
    0  a  d  b  b  3
    1  a  d  b  b  3
    2  a  d  b  b  3
    3  a  d  a  b  3
    4  a  d  a  b  3
    5  b  b  c  c  2
    6  b  b  c  c  2
    7  b  b  d  c  2