python google-colaboratory h2o automl data-preprocessing

h2o AutoML - retrain stacked ensemble from autoML - preprocessing the data

I am using h2o autoML on python.

I used the autoML part to find the best model possible: it is a StackedEnsemble.

Now I would like to take the model and retrain it on a bigger dataset (which was not possible before because I would explode the google colab free RAM capacity).

But AutoML does some preprocessing to my data and I don't know which one.

How can I get the preprocessing steps to re-apply it to my bigger data before feeding it to the model ?

Thanks in advance,

Gab

Solution

Stacked Ensemble is a model that is based on outputs of other models. To re-train the SE model you will need to re-train the individual models.

Apart from that AutoML will not pre-process the data. It delegates the pre-processing to downstream models. There is one exception - target encoding.

Did you enable TE in AutoML?