I am trying to compute the rolling means of values after grouping by. My dataset looks like
import pandas as pd
df = pd.DataFrame({'day': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03', '2020-01-03','2020-01-03'],
'weather': ['rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun'],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
Now, I want to have a rolling mean per day, and per weather.
While
>>>> df.groupby(['day', 'weather']).value.mean()
day weather
2020-01-01 rain 2
sun 3
2020-01-02 rain 6
sun 7
2020-01-03 rain 10
sun 11
proper computes the mean, the rolling version of it, does not seem to work?
>>>> df.groupby(['day', 'weather']).value.rolling(2).mean()
day weather
2020-01-01 rain 0 NaN
2 2.0
sun 1 NaN
3 3.0
2020-01-02 rain 4 NaN
6 6.0
sun 5 NaN
7 7.0
2020-01-03 rain 8 NaN
10 10.0
sun 9 NaN
11 11.0
What's the right way of doing it?
I would expect an output that is the mean over multiple days i.e. (ignore the index)
day weather
2020-01-01 rain 2 2.0
sun 3 3.0
2020-01-02 rain 6 4.0
sun 5 5.0
2020-01-03 rain 8 8.0
sun 9 9.0
I think you are referring to the NaN
values? Your window is set to 2
so the first value of each group will be set to NaN
because of min_periods
. Here is a quote of the documentation :
For a window that is specified by an integer, min_periods will default to the size of the window.
df.groupby(['day', 'weather']).value.rolling(2,min_periods=1).mean()
day weather
2020-01-01 rain 0 1.0
2 2.0
sun 1 2.0
3 3.0
2020-01-02 rain 4 5.0
6 6.0
sun 5 6.0
7 7.0
2020-01-03 rain 8 9.0
10 10.0
sun 9 10.0
11 11.0
Was that what you are searching for?
UPDATE
You want a mean value for each day and wheather and than a rolling mean over x days of the computed mean (if I understand that right). Try this:
out = df.groupby(['day','weather',],as_index=False)['value'].mean()
print(out)
day weather value
0 2020-01-01 rain 2.0
1 2020-01-01 sun 3.0
2 2020-01-02 rain 6.0
3 2020-01-02 sun 7.0
4 2020-01-03 rain 10.0
5 2020-01-03 sun 11.0
With the daily mean you can compute your rolling window:
out['rolling_mean'] = out.groupby('weather', as_index=False)['value'].rolling(2,min_periods=1).mean()['value']
print(out)
day weather value rolling_mean
0 2020-01-01 rain 2.0 2.0
1 2020-01-01 sun 3.0 3.0
2 2020-01-02 rain 6.0 4.0
3 2020-01-02 sun 7.0 5.0
4 2020-01-03 rain 10.0 8.0
5 2020-01-03 sun 11.0 9.0