Search code examples
rxgboost

R xgboost error with input data containing 'inf' or 'nan' but it has worked before


I am using R library of xgboost version 1.4.1.1 for a binary classification task which I have been regularly delivering for nearly 2 years.

I recently had an upgrade to my company laptop and needed to install R and the libraries I am using.

I am now trying to run this task and it gives me the following error:

Error in xgb.DMatrix(data, label = label, missing = missing) : 
  [14:28:40] amalgamation/../src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`

The data pipeline has not been changed, the data structure is exactly the same as before. I convert the data to data matrix, as below:

xgbmodel <- xgboost(data = data.matrix(mydata),
                  label = res,
                  eta = 0.2,
                  max_depth = 10,
                  gamma = 0.4,
                  lambda = 0.5,
                  nround = 40,
                  subsample = 0.7,
                  colsample_bytree = 0.75,
                  seed = 21,
                  eval_metric = "logloss",
                  objective = "binary:logistic"
)
#

And I get this error which I have never got before.

R version: 4.1.0

xgboost: 1.4.1.1

Any ideas on how to resolve this?

Edit: I uninstalled xgboost 1.4.1.1 and installed version 1.1.1.1. It's working. It seems to be a problem with the version and not with the data. I would like to use the latest version, that's why it would be good to know if anyone else has a similar issue and how to fix it.


Solution

  • I got the same error. I guess the previous version of xgboost wad handling missing values which is no longer the case. Either we drop the NA, or we impute them (as xgboost was doing before the update version) with for example KNN imputation, using mean or median. Not sure if it's the answer you were looking for.