I use code as below to create ML model in Python by using GridSearchCV. Now I need to make a SHAP summary plot, how can I do that after building my model using GridSearchCV ?
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
np.random.seed(42)
# generate some dummy data
df = pd.DataFrame(data=np.random.normal(loc=0, scale=1, size=(100, 3)), columns=['x1', 'x2', 'x3'])
df['y'] = np.where(df.mean(axis=1) > 0, 1, 0)
# find the best model
X = df.drop(labels=['y'], axis=1)
y = df['y']
parameters = {
'n_estimators': [100, 500, 1000],
'subsample': [0.01, 0.05]
}
clf = GridSearchCV(
param_grid=parameters,
estimator=XGBClassifier(random_state=42),
scoring='roc_auc',
cv=4,
verbose=0
)
clf.fit(X, y)
# get the feature importances
importances = clf.best_estimator_.get_booster().get_score(importance_type='gain')
print(importances)
clf is a fitted GridSearchCV, I am able to calculate importance of features, but how to build SHAP summary plot having GridSearch in Python ?
Here's how you can do it -
import shap
model = clf.best_estimator_
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.summary_plot(shap_values, X)