Search code examples
pythonpandasnumpy

Count NaNs in window size and min_periods of rolling Pandas function


We have the following pandas dataframe:

                   U
date                
1990-02-28       NaN
1990-03-01       NaN
1990-03-02 -0.068554
1990-03-05 -0.056425
1990-03-06 -0.022294
1990-03-07 -0.038996
1990-03-08 -0.026863

I want to compute the rolling mean of column 'U', with a window size = 5 and min_periods = 4. Pandas built-in rolling function considers only non-NaN values both for window size and min_periods. Instead, I would like to consider NaN values both for window size and min_periods, without affecting the calculation of the mean. This is the expected output:

                   U  rolling_mean
date                              
1990-02-28       NaN           NaN
1990-03-01       NaN           NaN
1990-03-02 -0.068554           NaN
1990-03-05 -0.056425     -0.062489
1990-03-06 -0.022294     -0.049091
1990-03-07 -0.038996     -0.046567
1990-03-08 -0.026863     -0.042626

Any way to accomplish this without loops? Thanks


Solution

  • Code

    Set min_periods to 0 and make the first three rows to NaN.

    mp = 4 # mean periods
    df['rolling_mean'] = df['U'].rolling(5, min_periods=0).mean().iloc[mp - 1:]
    

    df:

                       U  rolling_mean
    1990-02-28       NaN           NaN
    1990-03-01       NaN           NaN
    1990-03-02 -0.068554           NaN
    1990-03-05 -0.056425     -0.062490
    1990-03-06 -0.022294     -0.049091
    1990-03-07 -0.038996     -0.046567
    1990-03-08 -0.026863     -0.042626
    

    Example Code

    import pandas as pd
    data = {'U': [None, None, -0.068554, -0.056425, -0.022294, -0.038996, -0.026863]}
    idx=pd.to_datetime(['1990-02-28', '1990-03-01', '1990-03-02', '1990-03-05', '1990-03-06', '1990-03-07', '1990-03-08'])
    df = pd.DataFrame(data, index=idx)