Search code examples
pythonpython-3.xxgboostxgbregressor

How to set important features as attribute on XGBRegressor and save as part of json while saving the model


I have trained a XGBRegressor model. Now, I am trying to save the important features as attribute on the model and want that the attribute gets saved/restored along with the model.

I have 2 issues here -

1.

regressor.fit(X=X_train, y=y_train, eval_set=[(X_train, y_train), (X_validation, y_validation)], verbose=False)

feature_importance: List[Tuple[str, float]] = sorted(
            regressor.get_booster().get_score(importance_type="gain").items(), key=lambda x: x[1]
        )
selected_features: List[str] = [x[0] for x in feature_importance if x[1] > 0]
setattr(regressor, "selected_features", selected_features)

The setattr and corresponding getattr is giving me lint warnings (B010 and B009) - is there better way to do this to avoid those warnings?

The getattr usage is something like this -

def get_model_features(model: XGBRegressor) -> List[str] | None:
   return getattr(model, "selected_features") if (model is not None and isinstance(model, XGBRegressor) else None
 
  1. The attribute does not get saved in the json file. I am using following call to save -

regressor.save_model(fname="model.json")

How to accomplish this? I want to avoid pickle save/restore.


Solution

  • The attribute does not get saved in the json file

    This is the expected behaviour.

    The XGBRegressor.save_model(fname) method call simply "redirects" to the Booster.save_model(fname) method call. Any attributes that were defined in the top-most Scikit-Learn layer (such as custom feature importance attributes) will not be propagated along.

    The underlying XGBoost model saver/loader (via JSON/UBJSON) does not contain any logic for maintaining custom model metadata. Ony real model data, which is actually used by XGBoost itself.

    If you want to save Scikit-Learn wrappers with custom attributes, then you must keep using the pickle data format. No way around there.