Python version: 3.5 // xgboost version: 0.7.post3
Hi everyone,
I am trying to implement incremental learning using the xgboost module in python, where my target variable is binary. I think I am supposed to set the parameter "process_type": "update". The thing is that I get an error that I am not able to solve. Here I put an example implementation of my code using the breast cancer dataset from sklearn so everyone can give it a try. Does anybody know how I can prevent the following error from happening?
from sklearn import datasets
import xgboost
X_all = datasets.load_breast_cancer().data
y_all = datasets.load_breast_cancer().target
X_first_half = X_all[0:280,:]
X_second_half = X_all[280:,:]
y_first_half = y_all[0:280]
y_second_half = y_all[280:]
model1 = xgboost \
.train({'objective': 'binary:logistic'},
dtrain=xgboost.DMatrix(X_first_half, y_first_half),
xgb_model=None)
model2 = xgboost \
.train({'objective': 'binary:logistic',
'process_type': 'update',
'update': 'refresh',
'refresh_leaf': True},
dtrain=xgboost.DMatrix(X_second_half, y_second_half),
xgb_model=model1)
The error that I get is:
XGBoostError: b'[15:03:03] src/tree/updater_colmaker.cc:118:
Check failed: tree.param.num_nodes == tree.param.num_roots (19 vs. 1)
ColMaker: can only grow new tree\n\nStack trace returned 1 entries:\n[bt] (0)
I think he is trying to achieve a sort of batch training, I mean to further train the model with new data points without adding more trees to the ensemble. In other words, update the current trees/leafs to the new data points.
From the docs:
process_type, [default=’default’]
A type of boosting process to run. Choices: {‘default’, ‘update’} ‘default’: the normal boosting process which creates new trees. ‘update’: starts from an existing model and only updates its trees. In each boosting iteration, a tree from the initial model is taken, a specified sequence of updater plugins is run for that tree, and a modified tree is added to the new model. The new model would have either the same or smaller number of trees, depending on the number of boosting iteratons performed. Currently, the following built-in updater plugins could be meaningfully used with this process type: ‘refresh’, ‘prune’. With ‘update’, one cannot use updater plugins that create new trees.