Search code examples
pythontime-seriesgoogle-colaboratoryforecastingpycaret

Timeseries with pycaret hangs in compare models


I am trying to make a timeseries forecasting using pycaret autoML package using the data in the following link parts_revenue_data in google colab. When I try to compare the models and find the best the code hangs and stays at 20%.

compare_models

The code can be found in the following

# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"

def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
   !pip install pycaret
   what_is_installed()

import pandas as pd
import numpy as np
import pycaret
pycaret.__version__ # 3.1.0

df = pd.read_csv('parts_revenue.csv', delimiter=';')

from pycaret.utils.time_series import clean_time_index

cleaned = clean_time_index(data=df,
                           index_col='Posting Date',
                           freq='D')

# Verify the resulting DataFrame
print(cleaned.head(n=50))

# parts['MA12'] = parts['Parts Revenue'].rolling(12).mean()


# import plotly.express as px
# fig = px.line(parts, x="Posting Date", y=["Parts Revenue", 
#                "MA12"], template = 'plotly_dark')
# fig.show()

import time
import numpy as np

from pycaret.time_series import *

# We want to forecast the next 12 days of data and we will use 3 
# fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3

# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, 
# Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook 
# saved size is reduced.
fig_kwargs = {
              # "renderer": "notebook",
              "renderer": "png",
              "width": 1000,
              "height": 600,
             }

"""## EDA"""

eda = TSForecastingExperiment()
eda.setup(cleaned,
          fh=fh,
          numeric_imputation_target = 0,
          fig_kwargs=fig_kwargs
        )

eda.plot_model()


eda.plot_model(plot="diagnostics",
               fig_kwargs={"height": 800, "width": 1000}
              )

eda.plot_model(
               plot="diff",
               data_kwargs={"lags_list": [[1], [1, 7]],
               "acf": True,
               "pacf": True,
               "periodogram": True},
               fig_kwargs={"height": 800, "width": 1500} )


"""## Modeling"""

exp = TSForecastingExperiment()
exp.setup(data = cleaned,
          fh=fh,
          numeric_imputation_target = 0.0,
          fig_kwargs=fig_kwargs,
          seasonal_period = 5
      )

# compare baseline models
best = exp_ts.compare_models(errors = 'raise') # CODE HANGS HERE!

# plot forecast for 36 months in future
plot_model(best,
           plot = 'forecast',
           data_kwargs = {'fh' : 24}
       )

Is this related with a bug in pycaret or is something wrong with the code?


Solution

  • Note: I do not have enough rep to comment, so I'll drop this quasi-workaround here and I can delete it later if needed or move it to a comment once I have sufficient rep

    I have also experienced compare_models for time series to be uncannily slow (i.e., over 10 min runtime on a dataset with ~4000 records) when on my MBP with M1 Max. I have not tried it in Colab.

    Noticing that it was hanging on the Auto ARIMA one, I excluded it from the list like below. This reduced the run time to roughly 1 minute.

    # compare baseline models
    best = exp_ts.compare_models(errors="raise", exclude="auto_arima")
    

    While I'm aware this is not a fix per se, perhaps it can help you get unblocked.

    Environment details:

    • Python 3.10.12
    • pycaret==3.1.0