Search code examples
pythonpandasdataframerolling-computation

Is there a way to specify day frequency in computing rolling averages with a multi-index pandas dataframe?


I have a Pandas dataframe with data like the following. I need to group by company and date and then compute the rolling average (eg 2 days, 3 days, etc) by such grouping.

df = pd.DataFrame({
    'company': ['abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz'],
    'dt': pd.to_datetime([
        '2022-01-01', '2022-01-02', '2022-01-03',
        '2022-01-31', '2022-02-01', '2022-02-02'
    ]),
    'volume': [1, 2, 3, 4, 5, 6]
})

If I do the following, the rolling average is across the companies.

df.groupby(['company', 'dt'])['volume'].mean().rolling(2).mean()

The results is as follows, but is not what I want. When I get to company xyz it should only consider dates/values associated with that company.

company  dt        
abc      2022-01-01    NaN
         2022-01-02    1.5
         2022-01-03    2.5
xyz      2022-01-31    3.5
         2022-02-01    4.5
         2022-02-02    5.5

Additionally, I cannot specify the frequency to be 2D or 3D; if I do, then I get the following error: ValueError: window must be an integer 0 or greater.

Is the only way to do what I want to issue separate codes?

df[df['company']=='abc'].groupby(['dt'])['volume'].mean().rolling('2D').mean()
df[df['company']=='xyz'].groupby(['dt'])['volume'].mean().rolling('2D').mean()

Solution

  • There is better way. Group on company and use dt as a column on which to calculate rolling window:

    df['dt'] = pd.to_datetime(df['dt'])
    
    df.groupby('company').rolling('2d', on='dt')['volume'].mean()
    

    company  dt        
    abc      2022-01-01    1.0
             2022-01-02    1.5
             2022-01-03    2.5
    xyz      2022-01-31    4.0
             2022-02-01    4.5
             2022-02-02    5.5
    Name: volume, dtype: float64