Search code examples
pythontime-seriesseriesarimasktime

sktime ARIMA invalid frequency


I try to fit ARIMA model from sktime package. I import some dataset and convert it to pandas series. Then I fit the model on the train sample and when I try to predict the error occurs.

from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import ARIMA
import numpy as np, pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
                 parse_dates=['date']).set_index('date').T.iloc[0]
p, d, q = 3, 1, 2
y_train, y_test = temporal_train_test_split(df, test_size=24)
model = ARIMA((p, d, q))
results = model.fit(y_train)
fh = ForecastingHorizon(y_test.index, is_relative=False,)

# the error is here !!
y_pred_vals, y_pred_int = results.predict(fh, return_pred_int=True)

The error message is the following:

ValueError: Invalid frequency. Please select a frequency that can be converted to a regular
`pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations
currently do not work reliably.

I tried to use .asfreq("M") while reading the dataset, however, all the values in the series become NaN.
What is interesting is that this code works with the default load_airline dataset from sktime.datasets but not with my dataset from github.


Solution

  • I get a different error: ValueError: ``unit`` missing, possibly due to version difference. Anyhow, I'd say it is better to have your dataframe's index as pd.PeriodIndex instead of pd.DatetimeIndex. The former is I think more explicit (e.g. monthly series has its time-steps as periods not exact dates) and works more smoothly. So after reading the csv,

    df.index = pd.PeriodIndex(df.index, freq="M")
    

    should clear the error (it does in my version; 0.5.1): f