Search code examples
pythonnumpypandasdataframeaverage-precision

Apply rolling mean function on data frames with duplicated indices in pandas


I have difficulty to use pd.rolling_mean function on the following data frame containing duplicated indices:

               amount
    20140101    3
    20140102    4
    20140103    3
    20140103    5
    20140103    1
    20140104    5
    20140105    6
    20140106    2
    …

I need to calculate the 3 day average of 'amount', for example, average from 20140101 to 20140103 should be (3+4+3+5+1)/5=3.2, the average of amount from 20140104 to 20140106 should be (5+6+2)/3=4.3

does anyone know how to do it? Thank you in advance!


Solution

  • you can do:

    >>> df
              amount
    20140101       3
    20140102       4
    20140103       3
    20140103       5
    20140103       1
    20140104       5
    20140105       6
    20140106       2
    >>> xf = df.groupby(level=0)['amount'].agg(['sum', 'count'])
    >>> xf
              sum  count
    20140101    3      1
    20140102    4      1
    20140103    9      3
    20140104    5      1
    20140105    6      1
    20140106    2      1
    >>> pd.rolling_sum(xf['sum'], 3, 0) / pd.rolling_sum(xf['count'], 3, 0)
    20140101    3.000
    20140102    3.500
    20140103    3.200
    20140104    3.600
    20140105    4.000
    20140106    4.333
    dtype: float64
    

    and you get 3.2 and 4.3 for 20140103 and 20140106 respectively.