Search code examples
pythonxgboostpredict

Getting a weird error when trying to run xgboost.predict or xgboost.score


I'm trying to run an xgboost regressor model on a dataset without any missing data.

# Run GBM on training dataset
# Create xgboost object
pts_xgb = xgb.XGBRegressor(objective="reg:squarederror", missing=None, seed=42)

# Fit xgboost onto data
pts_xgb.fit(X_train
    ,y_train
    ,verbose=True
    ,early_stopping_rounds=10
    ,eval_metric='rmse'
    ,eval_set=[(X_test,y_test)])

The model creation seems to work fine, and I confirmed that X_train and y_train have no null values, using the following:

print(X_train.isnull().values.sum()) # prints 0
print(y_train.isnull().values.sum()) # prints 0

But when I run the following code, I get the below error.

Code:

pts_xgb.score(X_train,y_train)

Error:

---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
<ipython-input-37-39b223d418b2> in <module>
----> 1 pts_xgb.score(X_train_test,y_train_test)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
    551 
    552         from .metrics import r2_score
--> 553         y_pred = self.predict(X)
    554         return r2_score(y, y_pred, sample_weight=sample_weight)
    555 

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/sklearn.py in predict(self, X, output_margin, ntree_limit, validate_features, base_margin, iteration_range)
    818         if self._can_use_inplace_predict():
    819             try:
--> 820                 predts = self.get_booster().inplace_predict(
    821                     data=X,
    822                     iteration_range=iteration_range,

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/core.py in inplace_predict(self, data, iteration_range, predict_type, missing, validate_features, base_margin, strict_shape)
   1844             from .data import _maybe_np_slice
   1845             data = _maybe_np_slice(data, data.dtype)
-> 1846             _check_call(
   1847                 _LIB.XGBoosterPredictFromDense(
   1848                     self.handle,

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xgboost/core.py in _check_call(ret)
    208     """
    209     if ret != 0:
--> 210         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    211 
    212 

XGBoostError: [09:18:58] /Users/travis/build/dmlc/xgboost/src/c_api/c_api_utils.h:157: Invalid missing value: null
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x000000011e4e7064 dmlc::LogMessageFatal::~LogMessageFatal() + 116
  [bt] (1) 2   libxgboost.dylib                    0x000000011e4d9afc xgboost::GetMissing(xgboost::Json const&) + 268
  [bt] (2) 3   libxgboost.dylib                    0x000000011e4e0a13 void InplacePredictImpl<xgboost::data::ArrayAdapter>(std::__1::shared_ptr<xgboost::data::ArrayAdapter>, std::__1::shared_ptr<xgboost::DMatrix>, char const*, xgboost::Learner*, unsigned long, unsigned long, unsigned long long const**, unsigned long long*, float const**) + 531
  [bt] (3) 4   libxgboost.dylib                    0x000000011e4e04d3 XGBoosterPredictFromDense + 339
  [bt] (4) 5   libffi.dylib                        0x00007fff2dc7f8e5 ffi_call_unix64 + 85

Same error occurs if I try to run pts_xgb.predict(X_train)

Edit: this is not an issue with any missing/null values in either X_train or y_train. I got the same error when using the following dataset which is much smaller than my actual dataset (see below):

X_train: 1

y_train: 2

Anyone have any idea why this may be happening? I couldn't find any other forums that discuss the same issue.


Solution

  • this IS a missing/null value problem

    instead of xgb.XGBRegressor(objective="reg:squarederror", missing=None, seed=42)

    try xgb.XGBRegressor(objective="reg:squarederror", missing=1, seed=42)

    for reason, see the answer to: How to use missing parameter of XGBRegressor of scikit-learn