Search code examples
pythondataframemachine-learningclassificationshap

SHAP Linear model waterfall with KernelExplainer and LinearExplainer


I am working on binary classification and trying to explain my model using SHAP framework.

I am using logistic regression algorithm. I would like to explain this model using both KernelExplainer and LinearExplainer.

So, I tried the below code from SO here

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer, Explanation
from shap.plots import waterfall

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = LogisticRegression().fit(X, y)
background = shap.maskers.Independent(X, max_samples=100)
explainer = KernelExplainer(model,background)
sv = explainer(X.iloc[[5]])   # pass the row of interest as df
exp = Explanation(
    sv.values[:, :, 1],         # class to explain
    sv.base_values[:, 1],
    data=X.iloc[[idx]].values,  # pass the row of interest as df
    feature_names=X.columns,
)
waterfall(exp[0])  

         

This threw an error as shown below

AssertionError: Unknown type passed as data object: <class 'shap.maskers._tabular.Independent'>

How can I explain logistic regression model using SHAP KernelExplainer and SHAP LinearExplainer?


Solution

  • Calculation-wise the following will do:

    from sklearn.linear_model import LogisticRegression
    from sklearn.datasets import load_breast_cancer
    
    from shap import LinearExplainer, KernelExplainer, Explanation
    from shap.plots import waterfall
    from shap.maskers import Independent
    
    X, y = load_breast_cancer(return_X_y=True, as_frame=True)
    
    idx = 9
    model = LogisticRegression().fit(X, y)
    
    explainer = KernelExplainer(model.predict, X)
    sv = explainer.shap_values(X.loc[[5]])   # pass the row of interest as df
    
    exp = Explanation(sv,explainer.expected_value, data=X.loc[[idx]].values, feature_names=X.columns)
    waterfall(exp[0])
    

    enter image description here

    Note: KernelExplainer doesn't support maskers, and in this case either loc or iloc will return the same.

    background = Independent(X, max_samples=100)
    explainer = LinearExplainer(model,background)
    sv = explainer(X.loc[[5]])   # pass the row of interest by index
    waterfall(sv[0])
    

    enter image description here

    Note here, LinearExplainer's result can be provided to waterfall "as-is"