Search code examples
scikit-learnxgboost

Is this a bug in xgboost's XGBClassifier?


import numpy as np
from xgboost import XGBClassifier

model = XGBClassifier(
    use_label_encoder=False,
    label_lower_bound=0, label_upper_bound=1
    # setting the bounds doesn't seem to help
)

x = np.array([ [1,2,3], [4,5,6] ], 'ushort' )
y = [ 1, 1 ]

try :
    model.fit(x,y)
    # this fails with ValueError:
    #  "The label must consist of integer labels
    #  of form 0, 1, 2, ..., [num_class - 1]."
except Exception as e :
    print(e)

y = [ 0, 0 ]
# this works
model.fit(x,y)

model = XGBClassifier()
y = [ 1, 1 ]
# this works, but with UserWarning:
# "The use of label encoder in XGBClassifier is deprecated, etc."
model.fit(x,y)

Seems to me like label encoder is deprecated but we are FORCED to use it, if our classifications don't happen to contain a zero.


Solution

  • I had the same problem. I solved using use_label_encoder=False as parameter and the warning message disappear.

    I think in your case the problem is that you have only 1 in your y, but XGBoost wants the target starting from 0. If you change y = [ 1, 1 ] with y = [ 0, 0 ] the UserWarning should disappear.