Search code examples
pythontime-seriesstatsmodelsarima

how to solve LinAlgError & ValueError when training arima model with Python


I am trying to implement a time series model and getting some strange exceptions that tells nothing to me. I wonder if I am making a mistake or if it is totally expected. Here comes details...

When training my model, I try to make a grid search to find the best (p, d, q) settings. Here is the complete code (and I will explain down below what is happening here):

The reproducible code below is essentially a copy from https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/, with some slight changes...:

import warnings
from pandas import Series
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float64')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    print("Evaluating the settings: ", p, d, q)
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except Exception as exception:
                    print("Exception occured...", type(exception).__name__, "\n", exception)

    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# dataset
values = np.array([-1.45, -9.04, -3.64, -10.37, -1.36, -6.83, -6.01, -3.84, -9.92, -5.21,
                   -8.97, -6.19, -4.12, -11.03, -2.27, -4.07, -5.08, -4.57, -7.87, -2.80,
                   -4.29, -4.19, -3.76, -22.54, -5.87, -6.39, -4.19, -2.63, -8.70, -3.52, 
                   -5.76, -1.41, -6.94, -12.95, -8.64, -7.21, -4.05, -3.01])

# evaluate parameters
p_values = [7, 8, 9, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(values, p_values, d_values, q_values)

And here is the output (not everything but it gives enough information):

Evaluating the settings:  7 0 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 1
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 1 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 2 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.

The code is simply trying all different given settings, training the model, calculating MSE (mean squared error) for each given setting, and then selecting the best one (based on minimum MSE).

But during the training procedure, the code keeps throwing LinAlgError and ValueError exceptions, which tells nothing to me.

And as far as I can follow it, the code is not really truly training certain settings when these exceptions are thrown, and then just jumping to the next setting that will be tried out.

Why do I see these exceptions? Can they be ignored? What do I need to do to solve it out?


Solution

  • First, to answer your specific question: I think the "SVD did not converge" is a bug in the ARIMA model of Statsmodels. The SARIMAX model better supported these days (and does everything the ARIMA model does + more), so I would recommend using that instead. To do so, replace model creation with:

    model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)
    

    With that being said, I think that you are still unlikely to get good results given your time series and the specifications you are trying.

    In particular, your time series is very short, and you are only considering extremely long autoregressive lag lengths (p > 6). It will be difficult to estimate that many parameters with so few data points, particularly when you also have integration (d = 1 or d = 2) and when you also add in moving average components. I suggest that you re-evaluate which models you are considering.