I have trained a XGBRegressor
model. Now, I am trying to save the important features as attribute on the model and want that the attribute gets saved/restored along with the model.
I have 2 issues here -
1.
regressor.fit(X=X_train, y=y_train, eval_set=[(X_train, y_train), (X_validation, y_validation)], verbose=False)
feature_importance: List[Tuple[str, float]] = sorted(
regressor.get_booster().get_score(importance_type="gain").items(), key=lambda x: x[1]
)
selected_features: List[str] = [x[0] for x in feature_importance if x[1] > 0]
setattr(regressor, "selected_features", selected_features)
The setattr
and corresponding getattr
is giving me lint warnings (B010 and B009) - is there better way to do this to avoid those warnings?
The getattr usage is something like this -
def get_model_features(model: XGBRegressor) -> List[str] | None:
return getattr(model, "selected_features") if (model is not None and isinstance(model, XGBRegressor) else None
regressor.save_model(fname="model.json")
How to accomplish this? I want to avoid pickle save/restore.
The attribute does not get saved in the json file
This is the expected behaviour.
The XGBRegressor.save_model(fname)
method call simply "redirects" to the Booster.save_model(fname)
method call. Any attributes that were defined in the top-most Scikit-Learn layer (such as custom feature importance attributes) will not be propagated along.
The underlying XGBoost model saver/loader (via JSON/UBJSON) does not contain any logic for maintaining custom model metadata. Ony real model data, which is actually used by XGBoost itself.
If you want to save Scikit-Learn wrappers with custom attributes, then you must keep using the pickle data format. No way around there.