Why I get different expected_value when I include the training data in TreeExplainer?

Including the training data in SHAP TreeExplainer gives different expected_value in scikit-learn GBT Regressor.

Reproducible example (run in Google Colab):

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import shap

shap.__version__
# 0.37.0

X, y = make_regression(n_samples=1000, n_features=10, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gbt = GradientBoostingRegressor(random_state=0)
gbt.fit(X_train, y_train)

# mean prediction:
mean_pred_gbt = np.mean(gbt.predict(X_train))
mean_pred_gbt
# -11.534353657511172

# explainer without data
gbt_explainer = shap.TreeExplainer(gbt)
gbt_explainer.expected_value
# array([-11.53435366])

np.isclose(mean_pred_gbt, gbt_explainer.expected_value)
# array([ True])

# explainer with training data
gbt_data_explainer = shap.TreeExplainer(model=gbt, data=X_train) # specifying feature_perturbation does not change the result
gbt_data_explainer.expected_value
# -23.564797322079635

So, the expected value when including the training data gbt_data_explainer.expected_value is quite different from the one calculated without supplying the data (gbt_explainer.expected_value).

Both approaches are additive and consistent when used with the (obviously different) respective shap_values:

np.abs(gbt_explainer.expected_value + gbt_explainer.shap_values(X_train).sum(1) - gbt.predict(X_train)).max() < 1e-4
# True

np.abs(gbt_data_explainer.expected_value + gbt_data_explainer.shap_values(X_train).sum(1) - gbt.predict(X_train)).max() < 1e-4
# True

but I wonder why they do not provide the same expected_value, and why gbt_data_explainer.expected_value is so different from the mean value of predictions.

What am I missing here?

Solution

Apparently shap subsets to 100 rows when data is passed, then runs those rows through the trees to reset the sample counts for each node. So the -23.5... being reported is the average model output for those 100 rows.

The data is passed to an Independent masker, which does the subsampling:
https://github.com/slundberg/shap/blob/v0.37.0/shap/explainers/_tree.py#L94
https://github.com/slundberg/shap/blob/v0.37.0/shap/explainers/_explainer.py#L68
https://github.com/slundberg/shap/blob/v0.37.0/shap/maskers/_tabular.py#L216

Running

from shap import maskers

another_gbt_explainer = shap.TreeExplainer(
    gbt,
    data=maskers.Independent(X_train, max_samples=800),
    feature_perturbation="tree_path_dependent"
)
another_gbt_explainer.expected_value

gets back to

-11.534353657511172