I have a Pandas dataframe with data like the following. I need to group by company
and date
and then compute the rolling average (eg 2 days, 3 days, etc) by such grouping.
df = pd.DataFrame({
'company': ['abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz'],
'dt': pd.to_datetime([
'2022-01-01', '2022-01-02', '2022-01-03',
'2022-01-31', '2022-02-01', '2022-02-02'
]),
'volume': [1, 2, 3, 4, 5, 6]
})
If I do the following, the rolling average is across the companies.
df.groupby(['company', 'dt'])['volume'].mean().rolling(2).mean()
The results is as follows, but is not what I want. When I get to company xyz
it should only consider dates/values associated with that company.
company dt abc 2022-01-01 NaN 2022-01-02 1.5 2022-01-03 2.5 xyz 2022-01-31 3.5 2022-02-01 4.5 2022-02-02 5.5
Additionally, I cannot specify the frequency to be 2D
or 3D
; if I do, then I get the following error: ValueError: window must be an integer 0 or greater
.
Is the only way to do what I want to issue separate codes?
df[df['company']=='abc'].groupby(['dt'])['volume'].mean().rolling('2D').mean()
df[df['company']=='xyz'].groupby(['dt'])['volume'].mean().rolling('2D').mean()
There is better way. Group on company
and use dt
as a column on which to calculate rolling
window:
df['dt'] = pd.to_datetime(df['dt'])
df.groupby('company').rolling('2d', on='dt')['volume'].mean()
company dt
abc 2022-01-01 1.0
2022-01-02 1.5
2022-01-03 2.5
xyz 2022-01-31 4.0
2022-02-01 4.5
2022-02-02 5.5
Name: volume, dtype: float64