Search code examples

Is there a python equivalent for R's h2o.stack?

I am working with stacked learners. According to the docs for H2OStackedEnsembleEstimator h2o's python implementation allows you to easily build ensemble models. However this is limited to building base classifiers with the same underlying training data. I have time based features whose minimum date varies depending on the data source. Each sample of data is a point in time. To take advantage of as much data as I can, I split the features up until two groups (depending on relevance and minimum date) and train two separate models. I would like to combine these models, but H2OStackedEnsembleEstimator requires the features to be the same.

According to this post about R's stacked ensemble implementation there is an option to only perform the metalearning step which should require only the k-fold cross-validation predicitons for each base model and the true target value.

In case it crosses anyone's mind...for my particular problem, I realize I am going to run into an issue with the metalearning step with this mismatch in minimum date, and I have ideas to circumvent this.


  • For the Super Learner algorithm (stacking such that you use the cross-validated predicted values from the base learners as training data for the metalearner), the only requirement is that the base learners must be trained on the same rows -- the columns can be different. There is a variant of stacking, let's call it "Holdout Stacking", where you score the base models on a holdout dataset and use those predictions to train the metalearner instead. In this case, you can use entirely different training frames for the base learners.

    The current Stacked Ensembles implementation in H2O has a restriction that the whole training frame (rows and columns) must be the same for the base learners, but we will relax that requirement in the future (since it's not really required).

    Before we moved Stacked Ensembles in to the Java backend of H2O, I coded a simple reference implementation in Python using only the h2o Python module. For the time being, you could probably modify that code fairly easily to get the type of Stacked Ensemble that you're looking for. It's in a gist here.