Search code examples
pythontime-seriesforecastingarchsktime

How to forecast out-of-sample value in sktime SquaringResiduals?


I am trying to forecast out-of-sample value using sktime SquaringResiduals. Here is the code which working well for in-sample prediction.

from sktime.forecasting.arch import StatsForecastGARCH
from sktime.forecasting.squaring_residuals import SquaringResiduals
def hybridModel(p,q,model):
    out_sample_date = FH(np.arange(12), is_relative=True)
    in_sample_date = FH(df.index, is_relative=False)
    var_fc = StatsForecastGARCH(p=p,q=q)  
    sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc,initial_window=int(len(df))) 
    sqr = sqr.fit(df, fh=in_sample_date)  
    # y_pred2 = sqr.predict(out_sample_date) #out sample prediction 
    y_pred = sqr.predict(in_sample_date) #in sample prediction
    fig,ax=plot_series(df, y_pred, labels=["passenger", "y_pred"])
    return sqr,fig,y_pred,error_matrix(df,y_pred)
sqr,fig1,y_pred1,matrix1= hybridModel(1,1,forecaster)

Now I try to forecast out-sample. y_pred2 = sqr.predict(out_sample_date) #out sample prediction


> ValueError: A different forecasting horizon `fh` has been provided
> from the one seen already in `fit`, in this instance of
> SquaringResiduals. If you want to change the forecasting horizon,
> please re-fit the forecaster. This is because the fitting of the
> forecaster SquaringResiduals depends on `fh`.

So I change:sqr = sqr.fit(df, fh=in_sample_date) to sqr = sqr.fit(df)


> ValueError: The forecasting horizon `fh` must be passed to `fit` of
> SquaringResiduals, but none was found. This is because fitting of the
> forecaster SquaringResiduals depends on `fh`.

Then I change: sqr = sqr.fit(df, fh=in_sample_date) to sqr = sqr.fit(df, fh=out_sample_date)


> ValueError: The `window_length` and the forecasting horizon are
> incompatible with the length of `y`. Found `window_length`=84,
> `max(fh)`=11, but len(y)=84. It is required that the window length
> plus maximum forecast horizon is smaller than the length of the time
> series `y` itself.

 

Then I checked predict function for other model, and predict() function working well for both in-sample and out-sample prediction for non-hybrid model:

from sktime.forecasting.tbats import TBATS
from sktime.forecasting.base import ForecastingHorizon as FH
import warnings
import numpy as np
import pandas as pd
import mlflow
from sktime.utils import mlflow_sktime as mf
from sktime.utils.plotting import plot_series
warnings.filterwarnings("ignore")
out_sample_date = FH(np.arange(12), is_relative=True)
in_sample_date = FH(df.index, is_relative=False)
forecaster = TBATS(  
    use_box_cox=True,
    use_trend=True,
    use_damped_trend=True,
    sp=12,
    use_arma_errors=True,
    n_jobs=1)
forecaster.fit(df)  
y_pred = forecaster.predict(in_sample_date)  
y_pred2 = forecaster.predict(out_sample_date)
fig,ax = plot_series(df,y_pred,y_pred2,labels=['passenger','prediction','out_sample_pred'])

enter image description here

Why out-sample / in-sample prediction function does not work together for SquaringResiduals and how can we predict out-sample / in-sample value?

sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc,initial_window=int(len(df))) 
sqr = sqr.fit(df, fh=in_sample_date)  
y_pred2 = sqr.predict(out_sample_date) #out sample prediction 

Thank you so much for your attention.


Solution

  • The documentation explains that the forecaster is trained on y(t_1),...y(t_i) where i = initial_window, ... N-steps_ahead, and that this is used to calculate the residual r(t_i+steps_ahead) := y(t_i+steps_ahead) - ŷ(t_i+steps_ahead) for each value of i.

    The initial_window must be less than or equal to N-steps_ahead to make any forecasts for a positive number of steps_ahead. I believe the reason for this is if we consider initial_window = N-s where s is greater than or equal to 0, and steps_ahead=a, then in the first iteration of the loop over i, we get:

    r(t_i+steps_ahead) := y(t_i+steps_ahead) - ŷ(t_i+steps_ahead)
    r(t_(N-s+a)) := y(t_(N-s+a)) - ŷ(t_(N-s+a))
    

    Notice that y(t_(N-s+a)) is not known unless N-s+a <= N, or equivalently a < s because we don't know the true value of future timestamps.

    This means when you use SquaringResiduals, the maximum possible initial window you can supply is max_initial_window = len(df)-max(out_sample_date). Notice that we are using max(out_sample_date) and not len(out_sample_date) because np.arange(12) only asks for forecasts of steps_ahead = 0, ... 11 or a maximum forecast horizon of 11.

    Below is a fully reproducible example:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from sktime.forecasting.arch import StatsForecastGARCH
    from sktime.forecasting.squaring_residuals import SquaringResiduals
    
    from sktime.forecasting.tbats import TBATS
    from sktime.forecasting.base import ForecastingHorizon as FH
    import warnings
    import mlflow
    from sktime.utils import mlflow_sktime as mf
    from sktime.utils.plotting import plot_series
    warnings.filterwarnings("ignore")
    
    ## make up some random data
    np.random.seed(42)
    dates = pd.date_range(start='2012-01-01',end='2019-01-01',freq='1M')
    passengers = 40 + 10*np.sin(np.linspace(-np.pi, np.pi, len(dates))) + np.random.normal(loc=0, scale=2, size=len(dates))
    df = pd.DataFrame(data={"passenger": passengers}, index=pd.PeriodIndex(data=dates, freq='M'))
    
    def hybridModel(p,q,model):
        out_sample_date = FH(np.arange(12), is_relative=True)
        in_sample_date = FH(df.index, is_relative=False)
        
        max_initial_window = len(df)-max(out_sample_date) ## <-- max initial window cannot be any larger! 
        var_fc = StatsForecastGARCH(p=p,q=q)  
        sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc, initial_window=max_initial_window)
        sqr = sqr.fit(df, fh=in_sample_date)
        
        y_pred = sqr.predict(in_sample_date) #in sample prediction
        sqr = sqr.fit(df, fh=out_sample_date) 
        y_pred2 = sqr.predict(out_sample_date) #out sample prediction 
    
        fig,ax=plot_series(df, y_pred, y_pred2, labels=["passenger", "y_pred", "y_pred2"])
        plt.plot()
        return sqr,fig,y_pred
    
    forecaster = TBATS(  
        use_box_cox=True,
        use_trend=True,
        use_damped_trend=True,
        sp=12,
        use_arma_errors=True,
        n_jobs=1)
    forecaster.fit(df)  
    
    sqr,fig1,y_pred1= hybridModel(1,1,forecaster)
    fig1.show()
    

    enter image description here