Search code examples
pythonpandasapplyrolling-computation

Calculating rolling Root Mean Square Error in python


Suppose I have a pandas data frame, vols where

vols.head()

             Return     Vol
DataDate        
2019-12-26  0.002291    0.002400
2019-12-27  0.002292    0.002392
2019-12-30  0.002288    0.002385
2019-12-31  0.002288    0.002378
2020-01-01  0.002286    0.002378

Next I rename vols columns.

vols.columns = ['Realized', 'Predicted']
vols.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 922 entries, 2019-12-26 to 2023-07-27
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Realized   922 non-null    float64
 1   Predicted  922 non-null    float64
dtypes: float64(2)

I want to calculate rolling Root Mean Square Error.

vols_rolling = vols.rolling(window=52)

from sklearn.metrics import mean_squared_error as mse

vols_rolling.apply(lambda x: mse(x['Realized'], x['Predicted']))

I am getting following ValueError.

ValueError                                Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\_libs\tslibs\parsing.pyx:440, in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

File ~\anaconda3\Lib\site-packages\pandas\_libs\tslibs\parsing.pyx:649, in pandas._libs.tslibs.parsing.dateutil_parse()

ValueError: Unknown datetime string format, unable to parse: Realized

During handling of the above exception, another exception occurred:

The error is quite long. Trying not to copy paste it here.


Solution

  • The issue is that rolling.apply works per column, but your function need to access two columns simultaneously.

    You can cheat and use one Series to retrieve the index and slice the external DataFrame:

    from sklearn.metrics import mean_squared_error as mse
    
    vols_rolling = vols.rolling(window=52, min_periods=1)
    
    vols_rolling['Realized'].apply(lambda x: mse(vols.loc[x.index, 'Realized'], vols.loc[x.index, 'Predicted']))
    

    Output:

    DataDate
    2019-12-26    1.188100e-08
    2019-12-27    1.094050e-08
    2019-12-30    1.043000e-08
    2019-12-31    9.847500e-09
    2020-01-01    9.570800e-09
    Name: Realized, dtype: float64