Search code examples
pythonh2o

h2o H2OGenericEstimator training function not working


I want to enable incremental training using H2O models. I save the trained model using MOJO format for existing dataset/observations. Upon receiving new observations, I would like to load the MOJO-based model and retrain the existing model on new observations. However, this is not working.

Alternatively, I can train the model using specific model classes e.g. H2OGradientBoostingEstimator on the combined dataset but that will require me to keep track of all previous observations and cause higher disk usage.

H2O documentation for H2OGenericEstimator shows support for training function. However, based on experiments, the train function doesn't really make any difference.

from h2o.estimators import H2OGenericEstimator, H2OXGBoostEstimator
import tempfile
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv")
y = "IsDepDelayed"
x = ["fYear","fMonth","Origin","Dest","Distance"]
xgb = H2OXGBoostEstimator(ntrees=1, nfolds=3)
xgb.train(x=x, y=y, training_frame=airlines)
original_model_filename = tempfile.mkdtemp()
original_model_filename = xgb.download_mojo(original_model_filename)
key = h2o.lazy_import(original_model_filename)
fr = h2o.get_frame(key[0])
model = H2OGenericEstimator(model_key=fr)
model.train()
model.auc()

Is there any way to train the model loaded using MOJO file?


Solution

  • Currently h2o generic estimator loaded from mojo files can perform scoring but will not be able to be trained again.

    If you are interesting in training a previously build model, please consider using checkpoint. Here is the documentation on it: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/checkpoint.html#:~:text=The%20checkpoint%20option%20allows%20you,continuing%20building%20a%20previous%20model.