Calculating rolling Root Mean Square Error in python

Suppose I have a pandas data frame, vols where


             Return     Vol
2019-12-26  0.002291    0.002400
2019-12-27  0.002292    0.002392
2019-12-30  0.002288    0.002385
2019-12-31  0.002288    0.002378
2020-01-01  0.002286    0.002378

Next I rename vols columns.

vols.columns = ['Realized', 'Predicted']

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 922 entries, 2019-12-26 to 2023-07-27
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Realized   922 non-null    float64
 1   Predicted  922 non-null    float64
dtypes: float64(2)

I want to calculate rolling Root Mean Square Error.

vols_rolling = vols.rolling(window=52)

from sklearn.metrics import mean_squared_error as mse

vols_rolling.apply(lambda x: mse(x['Realized'], x['Predicted']))

I am getting following ValueError.

  • The issue is that rolling.apply works per column, but your function need to access two columns simultaneously.

    You can cheat and use one Series to retrieve the index and slice the external DataFrame:

    from sklearn.metrics import mean_squared_error as mse
    vols_rolling = vols.rolling(window=52, min_periods=1)
    vols_rolling['Realized'].apply(lambda x: mse(vols.loc[x.index, 'Realized'], vols.loc[x.index, 'Predicted']))


    2019-12-26    1.188100e-08
    2019-12-27    1.094050e-08
    2019-12-30    1.043000e-08
    2019-12-31    9.847500e-09
    2020-01-01    9.570800e-09
    Name: Realized, dtype: float64