Using TreeExplainer
in SHAP, I could not plot the Waterfall Plot.
Error Message:
---> 17 shap.plots.waterfall(shap_values[0], max_display=14)
TypeError: The waterfall plot requires an `Explanation` object as the
`shap_values` argument.
Since my model is tree based, I use TreeExplainer (because of using xgb.XGBClassifier).
If I use the Explainer
instead TreeExplainer
, I can plot Waterfall Plot.
My code is given below:
import pandas as pd
data = {
'a': [1, 2, 3, 3, 2, 1, 4, 5, 6, 7, 8, 1, 2, 3, 3, 2, 1, 4, 5, 6, 7, 8],
'b': [2, 1, 2, 3, 4, 6, 5, 8, 7, 9, 10, 2, 1, 2, 3, 4, 6, 5, 8, 7, 9, 10],
'c': [1, 5, 2, 4, 3, 9, 6, 8, 7, 10, 1, 1, 5, 2, 4, 3, 9, 6, 8, 7, 10, 1],
'd': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1],
'e': [1, 2, 3, 4, 3, 2, 1, 5, 4, 2, 1, 1, 2, 3, 4, 3, 2, 1, 5, 4, 2, 1],
'f': [1, 1, 2, 1, 2, 2, 3, 3, 3, 2, 1, 1, 1, 2, 1, 2, 2, 3, 3, 3, 2, 1],
'g': [3, 3, 2, 1, 3, 2, 1, 1, 1, 2, 2, 3, 3, 2, 1, 3, 2, 1, 1, 1, 2, 2],
'h': [1, 2, 1, 2, 3, 4, 5, 3, 4, 5, 5, 1, 2, 1, 2, 3, 4, 5, 3, 4, 5, 5],
'i': [1, 2, 1, 2, 3, 4, 5, 6, 5, 4, 6, 1, 2, 1, 2, 3, 4, 5, 6, 5, 4, 6],
'j': [5, 4, 3, 2, 1, 1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1, 1, 2, 3, 4, 5, 6],
'k': [3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 1, 3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 1],
'r': [1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1]
}
df = pd.DataFrame(data)
X = df.iloc[:,[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
y = df.iloc[:,11]
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 42)
import xgboost as xgb
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV
param_grid = {
'max_depth' : [6],
'n_estimators' : [500],
'learning_rate' : [0.3]
}
grid_search_xgboost = GridSearchCV(
estimator = xgb.XGBClassifier(),
param_grid = param_grid,
cv = 3,
verbose = 2,
n_jobs = -1
)
grid_search_xgboost.fit(X_train, y_train)
print("Best Parameters:", grid_search_xgboost.best_params_)
best_model_xgboost = grid_search_xgboost.best_estimator_
import shap
explainer = shap.TreeExplainer(best_model_xgboost)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train, plot_type="bar")
shap.summary_plot(shap_values, X_train)
for name in X_train.columns:
shap.dependence_plot(name, shap_values, X_train)
shap.force_plot(explainer.expected_value, shap_values[0], X_train.iloc[0], matplotlib=True)
shap.decision_plot(explainer.expected_value, shap_values[:10], X_train.iloc[:10])
shap.plots.waterfall(shap_values[0], max_display=14)
Where is the problem?
Instead of feeding shap values as numpy.ndarray
try an Explanation
object:
import xgboost as xgb
import shap
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV
data = {
'a': [1, 2, 3, 3, 2, 1, 4, 5, 6, 7, 8, 1, 2, 3, 3, 2, 1, 4, 5, 6, 7, 8],
'b': [2, 1, 2, 3, 4, 6, 5, 8, 7, 9, 10, 2, 1, 2, 3, 4, 6, 5, 8, 7, 9, 10],
'c': [1, 5, 2, 4, 3, 9, 6, 8, 7, 10, 1, 1, 5, 2, 4, 3, 9, 6, 8, 7, 10, 1],
'd': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1],
'e': [1, 2, 3, 4, 3, 2, 1, 5, 4, 2, 1, 1, 2, 3, 4, 3, 2, 1, 5, 4, 2, 1],
'f': [1, 1, 2, 1, 2, 2, 3, 3, 3, 2, 1, 1, 1, 2, 1, 2, 2, 3, 3, 3, 2, 1],
'g': [3, 3, 2, 1, 3, 2, 1, 1, 1, 2, 2, 3, 3, 2, 1, 3, 2, 1, 1, 1, 2, 2],
'h': [1, 2, 1, 2, 3, 4, 5, 3, 4, 5, 5, 1, 2, 1, 2, 3, 4, 5, 3, 4, 5, 5],
'i': [1, 2, 1, 2, 3, 4, 5, 6, 5, 4, 6, 1, 2, 1, 2, 3, 4, 5, 6, 5, 4, 6],
'j': [5, 4, 3, 2, 1, 1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1, 1, 2, 3, 4, 5, 6],
'k': [3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 1, 3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 1],
'r': [1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1]
}
df = pd.DataFrame(data)
X = df.iloc[:,[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
y = df.iloc[:,11]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 42)
param_grid = {
'max_depth' : [6],
'n_estimators' : [500],
'learning_rate' : [0.3]
}
grid_search_xgboost = GridSearchCV(
estimator = xgb.XGBClassifier(),
param_grid = param_grid,
cv = 3,
verbose = 2,
n_jobs = -1
)
grid_search_xgboost.fit(X_train, y_train)
print("Best Parameters:", grid_search_xgboost.best_params_)
best_model_xgboost = grid_search_xgboost.best_estimator_
explainer = shap.TreeExplainer(best_model_xgboost)
exp = explainer(X_train) # <-- here
print(type(exp))
shap.plots.waterfall(exp[0])
<class 'shap._explanation.Explanation'>
Why?
Because SHAP
has 2 plotting interfaces: old and new one. The old one (your first 2 plots) expects shap values as numpy's ndarray
. The new one expects an Explanation
object (which is BTW clearly stated in the error message).