Search code examples
pythontime-seriespycaret

is test data used in Pycaret time series(beta) completely unseen by the model(s)?


Post checking official documentation and example, I am still confused if test data passed to the setup function is completely unseen by the model???

from pycaret.datasets import get_data
from pycaret.internal.pycaret_experiment import TimeSeriesExperiment

# get data
y = get_data('airline', verbose=False)

# no of future steps to forecast
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3

# setup
exp = TimeSeriesExperiment()
exp.setup(data=y, fh=fh, fold = fold)
exp.models()

which gives description as:

enter image description here

Also, checked at cv-graph, we can conclude that test data set is not used while cv. But, Still as it's not mentioned anywhere about it, need a concrete evidence.

Train-Test split enter image description here

Train c-v splits enter image description here


Solution

  • If you notice the cv splits, they do not use the test data at all. So any step such as create_model, tune_model, blend_model, compare_models that use Cross-Validation, will not use the test data at all for training.

    Once you are happy with the models from these steps, you can finalize the model using finalize_model. In this case, whatever model you pass to finalize_model is trained on the complete dataset (train + test) so that you can make true future predictions.