Post checking official documentation and example, I am still confused if test data passed to the setup function is completely unseen by the model???
from pycaret.datasets import get_data
from pycaret.internal.pycaret_experiment import TimeSeriesExperiment
# get data
y = get_data('airline', verbose=False)
# no of future steps to forecast
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
# setup
exp = TimeSeriesExperiment()
exp.setup(data=y, fh=fh, fold = fold)
exp.models()
which gives description as:
Also, checked at cv-graph, we can conclude that test data set is not used while cv. But, Still as it's not mentioned anywhere about it, need a concrete evidence.
If you notice the cv splits, they do not use the test data at all. So any step such as create_model
, tune_model
, blend_model
, compare_models
that use Cross-Validation, will not use the test data at all for training.
Once you are happy with the models from these steps, you can finalize the model using finalize_model
. In this case, whatever model you pass to finalize_model
is trained on the complete dataset (train + test) so that you can make true future predictions.