Pandas version 0.25 supports "Named Aggregation" via function agg
and namedtuples
. You need to pass column, aggregator pairs as the doc describes. It also says:
If your aggregation functions require additional arguments, partially apply them with functools.partial().
I would like to apply this principle to get a weighted average (besides a simple count and average). My input table is
import pandas as pd
t = pd.DataFrame({'bucket':['a', 'a', 'b', 'b', 'b'], 'weight': [2, 3, 1, 4, 3],
'qty': [100, 500, 200, 800, 700]})
and my query fails:
import functools
import numpy as np
t.groupby('bucket').agg(
NR= ('bucket', 'count'),
AVG_QTY= ('qty', np.mean),
W_AVG_QTY= ('qty', functools.partial(np.average, weights='weight'))
)
with an error message:
TypeError: 1D weights expected when shapes of a and weights differ.
I assume the problem comes from fixing the parameter to be another column instead of a constant? How can I make this work without the workaround that uses apply
and a lambda expression that returns a Series
?
A weighted average requires 2 separate Series (i.e. a DataFrame). Because of this GroupBy.apply
is the correct aggregation method to use. Use pd.concat
to join the results.
pd.concat([t.groupby('bucket').agg(NR = ('bucket', 'count'),
AVG_QTY = ('qty', np.mean)),
(t.groupby('bucket').apply(lambda gp: np.average(gp.qty, weights=gp.weight))
.rename('W_AVG_QTY'))],
axis=1)
# NR AVG_QTY W_AVG_QTY
#bucket
#a 2 300.000000 340.0
#b 3 566.666667 687.5
This can be done with agg
, assuming your DataFrame has a unique Index, though I can't guarantee it will be very performant given all the slicing. We create our own function that accepts the Series of values and the entire DataFrame. The function then subsets the DataFrame
using the Series to obtain the weights for each group.
def my_w_avg(s, df, wcol):
return np.average(s, weights=df.loc[s.index, wcol])
t.groupby('bucket').agg(
NR= ('bucket', 'count'),
AVG_QTY= ('qty', np.mean),
W_AVG_QTY= ('qty', functools.partial(my_w_avg, df=t, wcol='weight'))
)
# NR AVG_QTY W_AVG_QTY
#bucket
#a 2 300.000000 340.0
#b 3 566.666667 687.5