Search code examples
pythonmachine-learningxgboostshap

Converting XGBoost Shapely values to SHAP's Explanation object


I am trying to convert XGBoost shapely values into an SHAP explainer object. Using the example [here][1] with the built in SHAP library takes days to run (even on a subsampled dataset) while the XGBoost library takes a few minutes. However. I would like to output a beeswarm graph that's similar to what's displayed in the example [here][2].

My thought was that I could use the XGBoost library to recover the shapely values and then plot them using the SHAP library, but the beeswarm plot requires an explainer object. How can I convert my XGBoost booster object into an explainer object?

Here's what I tried:

import shap
booster = model.get_booster()
d_test = xgboost.DMatrix(X_test[0:100], y_test[0:100])
shap_values = booster.predict(d_test, pred_contribs=True)
shap.plots.beeswarm(shap_values)

Which returns:

TypeError: The beeswarm plot requires an `Explanation` object as the `shap_values` argument.

To clarify, I would like to create the explainer object out of values generated by the xgboost built-in library, if possible. Avoiding the shap.explainer or shap.TreeExplainer function calls is a priority because they take much much longer (days) to return rather than minutes. [1]: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Python%20Version%20of%20Tree%20SHAP.html [2]: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/beeswarm.html#A-simple-beeswarm-summary-plot


Solution

  • If you're after building an Explanation object (rather than Explainer like you stated in your question), then you can do the following:

    import xgboost as xgb
    import shap
    from sklearn.model_selection import train_test_split
    
    X, y = shap.datasets.california()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
    
    d_train = xgb.DMatrix(X_train, y_train)
    d_test = xgb.DMatrix(X_test, y_test)
    
    params = {"objective": "reg:squarederror", "tree_method": "hist", "device":"cuda"}
    
    model = xgb.train(params, d_train, 100)
    shap_values = model.predict(d_test, pred_contribs=True)
    
    exp = shap.Explanation(shap_values[:,:-1], data = X_test, feature_names=X.columns)
    shap.summary_plot(exp)
    

    enter image description here