Search code examples
pythonpandasweighted-average

Weighted average, grouped by day, of 2 columns based on criteria in 3rd column of pandas dataframe


I have a pandas dataframe

import pandas as pd
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'], 
                   'Col2' : np.random.rand(48), 
                   'Col3' : np.random.randint(5, 20, 48)},
                   index = pd.date_range('2017-01-01', periods=48, freq='H'))

In [1]: df.tail()
Out [1]: 
                    Col1      Col2  Col3
2017-01-02 19:00:00    B  0.144572     7
2017-01-02 20:00:00    C  0.740500    11
2017-01-02 21:00:00    A  0.357077    19
2017-01-02 22:00:00    B  0.652536     9
2017-01-02 23:00:00    C  0.022437     8

I want to return a dataframe that displays the weighted average of Col3 by date where Col2 is the weighting and Col1 is either 'B' or 'C' while ignoring 'A.' This would return something that looks like the following.

           WtdAvg
2017-01-01   XX.X
2017-01-02   YY.Y

Solution

  • Filter the DataFrame to remove values where Col1 is 'A', then perform a groupby using np.average:

    df[df['Col1'] != 'A'].groupby(pd.TimeGrouper('D')) \
                         .apply(lambda grp: np.average(grp['Col3'], weights=grp['Col2']))
    

    The resulting output (using np.random.seed([3,1415]) as the random state seed) :

    2017-01-01    11.975517
    2017-01-02    12.411798