I am trying to convert XGBoost shapely values into an SHAP explainer object. Using the example [here][1] with the built in SHAP library takes days to run (even on a subsampled dataset) while the XGBoost library takes a few minutes. However. I would like to output a beeswarm graph that's similar to what's displayed in the example [here][2].
My thought was that I could use the XGBoost library to recover the shapely values and then plot them using the SHAP library, but the beeswarm plot requires an explainer object. How can I convert my XGBoost booster object into an explainer object?
Here's what I tried:
import shap
booster = model.get_booster()
d_test = xgboost.DMatrix(X_test[0:100], y_test[0:100])
shap_values = booster.predict(d_test, pred_contribs=True)
shap.plots.beeswarm(shap_values)
Which returns:
TypeError: The beeswarm plot requires an `Explanation` object as the `shap_values` argument.
To clarify, I would like to create the explainer object out of values generated by the xgboost built-in library, if possible. Avoiding the shap.explainer or shap.TreeExplainer function calls is a priority because they take much much longer (days) to return rather than minutes. [1]: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Python%20Version%20of%20Tree%20SHAP.html [2]: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/beeswarm.html#A-simple-beeswarm-summary-plot
If you're after building an Explanation
object (rather than Explainer
like you stated in your question), then you can do the following:
import xgboost as xgb
import shap
from sklearn.model_selection import train_test_split
X, y = shap.datasets.california()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
d_train = xgb.DMatrix(X_train, y_train)
d_test = xgb.DMatrix(X_test, y_test)
params = {"objective": "reg:squarederror", "tree_method": "hist", "device":"cuda"}
model = xgb.train(params, d_train, 100)
shap_values = model.predict(d_test, pred_contribs=True)
exp = shap.Explanation(shap_values[:,:-1], data = X_test, feature_names=X.columns)
shap.summary_plot(exp)