I am calculating the weighted mean on the following data. I get a positional indexers are out of bounds error.
dfpa:
PA pa_pop pa_mean_ea_kwh
A 30 42431.52608
B 1596 177765.6662
C 193 284501.2339
D 0
E 84 316868.3264
this is my code:
wmea = lambda x: np.ma.average(x, weights=(dfpa.loc[x.index, 'pa_mean_ea_kwh'] * dfpa.loc[x.index, 'pa_pop']))
dfsw = dfpa.agg(
sw_pop = ('pa_pop', 'sum'),
sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))
In the past I've successfully used similar code, but my aggregation is coupled with a groupby. So it would be
dfsw = dfpa.groupby('PA').agg(
sw_pop = ('pa_pop', 'sum'),
sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))
But in this case I just need a single output for all the data. Is the np.ma.average lambda function incorrect to achieve this? Why won't it just do a sumproduct and then divide by the sum?
adding desired output:
sw_pop sw_mean_ea_kwh
1903 192597.2814
where sw_mean_ea_kwh is calculated as SUMPRODUCT(pa_pop,pa_mean_ea_kwh)/SUM(pa_pop)
ps: there's an alternative solution here to compute weighted average, but it again uses groupby
This works similar to the sumproduct and dividing by the sum approach:
(dfpa['pa_pop']*dfpa['pa_mean_ea_kwh']).sum()/(dfpa['pa_pop'].sum())