Search code examples
pythonh2ohyperparameters

Get the number of trees used for a GBM with early stopping


I trained a GBM in h2o using early stopping and setting ntrees=10000. I want to retrieve the number of trees are actually in the model. But if I called model.params['ntrees'] (where model is the best model from a gridsearch) I get

{'default': 50, 'actual': 10000}

where the 10000 is the parameter I set during training but not the actual number of trees that ended up in the model.

If I call model.score_history() then i can see that early stopping kicked in at 280 trees. But surely there is a more direct way to find out the actual number of trees in the model than this hack:

best_model.score_history()['number_of_trees'].max()

Solution

  • There currently isn't a clean way to do this. An alternative method, which doesn't require calculating a max but is still clunk to do is model.summary()['number_of_trees'][0] if you want the number, model.summary()['number_of_trees'] if you want the number in a list. Or just model.summary() if you just want to see the number.