Search code examples
pandasaggregate

Pandas groupby and aggregate on some operation between columns


I'm an R and data.tableuser trying to use more Python and pandas, and when I want to do the following in pandas, I get so frustrated because there does not seem to be an easy way:

df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': [0.2, 0.2, 0.2, 0.1], 'c': [10, 9, 10, 22], 'd': [2, 3, 3, 1]})

In R if I wanted to sum the product of b and c and divide on the sum of d, grouped on a, I would just do:

df[, sum(b*c)/sum(d), by=a]

, but this seems hard for me to do elegantly in pandas. I mean no offence, please don't hate me, it is probably just me not understanding pandas yet.

Solution thus far:

df[["b", "c"]].prod(axis=1).groupby(df["a"]).sum()/df.groupby("a").d.sum()

Solution

  • I would have done the same as you did.

    If you want you can also use groupby.apply:

    df.groupby('a').apply(lambda g: (g['b']*g['c']).sum()/g['d'].sum())
    

    output:

    a
    1    0.76
    2    1.05
    dtype: float64