I am using h2o autoML on python.
I used the autoML part to find the best model possible: it is a StackedEnsemble.
Now I would like to take the model and retrain it on a bigger dataset (which was not possible before because I would explode the google colab free RAM capacity).
But AutoML does some preprocessing to my data and I don't know which one.
How can I get the preprocessing steps to re-apply it to my bigger data before feeding it to the model ?
Thanks in advance,
Gab
Stacked Ensemble is a model that is based on outputs of other models. To re-train the SE model you will need to re-train the individual models.
Apart from that AutoML will not pre-process the data. It delegates the pre-processing to downstream models. There is one exception - target encoding.
Did you enable TE in AutoML?