Search code examples
pythonpandasgroup-bypandas-apply

How can I apply multiple functions involving multiple columns of a pandas dataframe with grouby?


Considering the following datafrme:

id cat date max score
1 s1 A 12/06 9 5.4
2 s1 B 12/06 10 5.4
3 s2 C 11/04 13 4.2
4 s2 D 11/04 28 10
5 s3 E 08/02 16 5.4
5 s3 F 08/02 6 5.4

I want to group by "id" and get the first of the columns [cat, date] but at the same time I want to get the mean of the max - score columns. I think it can be done with an aggregate:

but with aggregate I can't access all the columns of the dataframe.

id cat date mean(max-score)
s1 A 12/06 4.1
s2 C 11/04 13.4
s3 E 08/02 5.6

Solution

  • You first need to assign max-score to mean(max-score), then this is a simple groupby+agg:

    (df.assign(**{'mean(max-score)': df['max']-df['score']})
       .groupby('id', as_index=False)
       .agg({'cat': 'first', 'date': 'first', 'mean(max-score)': 'mean'})
    )
    

    output:

       id cat   date  mean(max-score)
    0  s1   A  12/06              4.1
    1  s2   C  11/04             13.4
    2  s3   E  08/02              5.6