I have a pandas dataframe
import pandas as pd
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'],
'Col2' : np.random.rand(48),
'Col3' : np.random.randint(5, 20, 48)},
index = pd.date_range('2017-01-01', periods=48, freq='H'))
In [1]: df.tail()
Out [1]:
Col1 Col2 Col3
2017-01-02 19:00:00 B 0.144572 7
2017-01-02 20:00:00 C 0.740500 11
2017-01-02 21:00:00 A 0.357077 19
2017-01-02 22:00:00 B 0.652536 9
2017-01-02 23:00:00 C 0.022437 8
I want to return a dataframe that displays the weighted average of Col3 by date where Col2 is the weighting and Col1 is either 'B' or 'C' while ignoring 'A.' This would return something that looks like the following.
WtdAvg
2017-01-01 XX.X
2017-01-02 YY.Y
Filter the DataFrame to remove values where Col1 is 'A', then perform a groupby
using np.average
:
df[df['Col1'] != 'A'].groupby(pd.TimeGrouper('D')) \
.apply(lambda grp: np.average(grp['Col3'], weights=grp['Col2']))
The resulting output (using np.random.seed([3,1415])
as the random state seed) :
2017-01-01 11.975517
2017-01-02 12.411798