Search code examples
pythongoogle-colaboratoryh2oautomldata-preprocessing

h2o AutoML - retrain stacked ensemble from autoML - preprocessing the data


I am using h2o autoML on python.

I used the autoML part to find the best model possible: it is a StackedEnsemble.

Now I would like to take the model and retrain it on a bigger dataset (which was not possible before because I would explode the google colab free RAM capacity).

But AutoML does some preprocessing to my data and I don't know which one.

How can I get the preprocessing steps to re-apply it to my bigger data before feeding it to the model ?

Thanks in advance,

Gab


Solution

  • Stacked Ensemble is a model that is based on outputs of other models. To re-train the SE model you will need to re-train the individual models.

    Apart from that AutoML will not pre-process the data. It delegates the pre-processing to downstream models. There is one exception - target encoding.

    Did you enable TE in AutoML?