Search code examples
rxgboost

R - figuring out what columns an xgboost model is expecting in new data for predictions


We have a .model file that has an xgboost model. Here's a snippet of our code loading the model:

> xg_model <- xgb.load("../model_outputs/our_saved_model.model")
> xg_model
##### xgb.Booster
raw: 1.6 Mb 
xgb.attributes:
  niter
niter: 149

I didn't create this model, but I am tasked with passing new data to the model in order to make predictions. Unfortunately, I am hitting this error:

Error in predict.xgb.Booster(xg_model, xgb.DMatrix(as.matrix(our_dataframe_of_data))) : 
  [01:34:01] amalgamation/../src/learner.cc:1183: Check failed: learner_model_param_.num_feature >= p_fmat->Info().num_col_ (38 vs. 40) : Number of columns does not match number of features in booster.

... so it's clear that our dataframe has 40 columns, but this model is trained to expect a dataframe with 38 columns. What's unclear is exactly which 38 columns our xg_model is expecting. Is there a function to call / plot to graph / etc. that might show what 38 columns the model was trained on? We only have the trained model currently, but not the R code that trained the model...


Solution

  • What's your XGBoost version? It's important to know, because XGBoost "schema specification" has been evolving quite significantly.

    Right now, you should explore what attributes are available on your xgb.Booster object. See if it has nfeatures and feature_names attributes defined:

    print(xg_model$nfeatures)
    print(xg_model$feature_names)
    

    I believe your xgb.Booster object has these attributes available, because how else would it know to demand 38 features?