We have the following pandas dataframe:
U
date
1990-02-28 NaN
1990-03-01 NaN
1990-03-02 -0.068554
1990-03-05 -0.056425
1990-03-06 -0.022294
1990-03-07 -0.038996
1990-03-08 -0.026863
I want to compute the rolling mean of column 'U', with a window size = 5 and min_periods = 4. Pandas built-in rolling function considers only non-NaN values both for window size and min_periods. Instead, I would like to consider NaN values both for window size and min_periods, without affecting the calculation of the mean. This is the expected output:
U rolling_mean
date
1990-02-28 NaN NaN
1990-03-01 NaN NaN
1990-03-02 -0.068554 NaN
1990-03-05 -0.056425 -0.062489
1990-03-06 -0.022294 -0.049091
1990-03-07 -0.038996 -0.046567
1990-03-08 -0.026863 -0.042626
Any way to accomplish this without loops? Thanks
Code
Set min_periods to 0 and make the first three rows to NaN.
mp = 4 # mean periods
df['rolling_mean'] = df['U'].rolling(5, min_periods=0).mean().iloc[mp - 1:]
df:
U rolling_mean
1990-02-28 NaN NaN
1990-03-01 NaN NaN
1990-03-02 -0.068554 NaN
1990-03-05 -0.056425 -0.062490
1990-03-06 -0.022294 -0.049091
1990-03-07 -0.038996 -0.046567
1990-03-08 -0.026863 -0.042626
Example Code
import pandas as pd
data = {'U': [None, None, -0.068554, -0.056425, -0.022294, -0.038996, -0.026863]}
idx=pd.to_datetime(['1990-02-28', '1990-03-01', '1990-03-02', '1990-03-05', '1990-03-06', '1990-03-07', '1990-03-08'])
df = pd.DataFrame(data, index=idx)