Search code examples
pythonpandasaggregateweighted-average

Calculate weighted average without using groupby


I am calculating the weighted mean on the following data. I get a positional indexers are out of bounds error.

dfpa:

PA  pa_pop  pa_mean_ea_kwh
A   30      42431.52608
B   1596    177765.6662
C   193     284501.2339
D   0   
E   84      316868.3264

this is my code:

wmea = lambda x: np.ma.average(x, weights=(dfpa.loc[x.index, 'pa_mean_ea_kwh'] * dfpa.loc[x.index, 'pa_pop']))
dfsw = dfpa.agg(
    sw_pop = ('pa_pop', 'sum'),
    sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))

In the past I've successfully used similar code, but my aggregation is coupled with a groupby. So it would be

dfsw  = dfpa.groupby('PA').agg(
    sw_pop = ('pa_pop', 'sum'),
    sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))

But in this case I just need a single output for all the data. Is the np.ma.average lambda function incorrect to achieve this? Why won't it just do a sumproduct and then divide by the sum?

adding desired output:

sw_pop  sw_mean_ea_kwh
1903    192597.2814

where sw_mean_ea_kwh is calculated as SUMPRODUCT(pa_pop,pa_mean_ea_kwh)/SUM(pa_pop)

ps: there's an alternative solution here to compute weighted average, but it again uses groupby


Solution

  • This works similar to the sumproduct and dividing by the sum approach:

    (dfpa['pa_pop']*dfpa['pa_mean_ea_kwh']).sum()/(dfpa['pa_pop'].sum())