Search code examples
pythonmatplotlibdeep-learningshap

Is there a way to change the order of the feature in summary_plot of shap?


I want to decide the order of the feature inside the shap.summary_plot Here is a code that is working, I want to learn to change the order of the variable instead of having ordered in list of importance.

import matplotlib.pyplot as plt
import numpy as np
import shap
from tensorflow import keras
from tensorflow.keras import layers
X = np.array([[(1,2,3,3,1),(3,2,1,3,2),(3,2,2,3,3),(2,2,1,1,2),(2,1,1,1,1)],
              [(4,5,6,4,4),(5,6,4,3,2),(5,5,6,1,3),(3,3,3,2,2),(2,3,3,2,1)],
              [(7,8,9,4,7),(7,7,6,7,8),(5,8,7,8,8),(6,7,6,7,8),(5,7,6,6,6)],
              [(7,8,9,8,6),(6,6,7,8,6),(8,7,8,8,8),(8,6,7,8,7),(8,6,7,8,8)],
              [(4,5,6,5,5),(5,5,5,6,4),(6,5,5,5,6),(4,4,3,3,3),(5,5,4,4,5)],
              [(4,5,6,5,5),(5,5,5,6,4),(6,5,5,5,6),(4,4,3,3,3),(5,5,4,4,5)],
              [(1,2,3,3,1),(3,2,1,3,2),(3,2,2,3,3),(2,2,1,1,2),(2,1,1,1,1)]])
y = np.array([0, 1, 2, 2, 1, 1, 0])

# Updated model with correct input shape
model = keras.Sequential([
    layers.Conv1D(128, kernel_size=3, activation='relu',input_shape=(5,5)),
    layers.MaxPooling1D(pool_size=2),
    layers.LSTM(128, return_sequences=True),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(5, activation='softmax')  # Adjust the number of output units based on your problem (3 for 3 classes)
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10)

explainer = shap.GradientExplainer(model, X)
shap_values = explainer.shap_values(X)
#print(shap_values)

cls = 0
idx = 0
feature_names = ["Feature1","Feature2","Feature4","Prova4","Feature5"]
shap.summary_plot(shap_values[cls][:,idx,:], X[:,idx,:], plot_type="bar", feature_names=feature_names)

Is there a way to choose the order of the feature plotted?


Solution

  • A simple hack for me was to set sort=False in the function, and then pass in the order myself. This can be done with some argsort between pandas dataframe and numpy, or shap-values can be intermediately converted to Dataframe and then reordered as desired:

    # Features dataframe
    features_df = pd.DataFrame(..., columns=["a", "b", "c"])
    
    # Decide the order you want for the plot
    feature_names_ordered_for_plot = ["b", "a", "c"]
    
    # Order the features_df and the shap-values
    features_df_ordered_for_plot = features_df[feature_names_ordered_for_plot]
    shap_values_ordered_for_plot = pd.DataFrame(shap_values, columns=features_df.columns)[feature_names_ordered_for_plot].to_numpy()
    
    # Plot
    shap.summary_plot(
        shap_values_ordered_for_plot,
        features_df_ordered_for_plot,
        sort=False
    )