Search code examples
pythonmatplotlibscikit-learn

PartialDependenceDisplay.from_estimator plots having lines with 0 values


Need to evaluate the two way interaction between two variables after regressor model. Used PartialDependenceDisplay.from_estimator to plot but the contour lines inside the plot all have value 0.Not sure what might cause this. Checked the data and model and there are no problems while loading the model and data. Checked the other two variable combinations and they have same issue.

from sklearn.inspection import partial_dependence, PartialDependenceDisplay
model = load_model(model_path)
model_features = model.feature_name_

fig, ax = plt.subplots(figsize=(10,5))
X = training_data[model_features]
PartialDependenceDisplay.from_estimator(model, X, features=[('temperature',  'speed')], ax=ax, n_jobs=-1, grid_resolution=20)

enter image description here


Solution

  • Most probably your contour values are all < 0.005. Contour labels are formatted as "%2.2f" and there appears to be no documented way of changing this format. The only workaround I could think of is to retrieve the labels and their values and replace the label texts:

    import matplotlib.pyplot as plt
    from matplotlib.text import Text
    import numpy as np
    from sklearn.datasets import make_friedman1
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.inspection import PartialDependenceDisplay
    
    X, y = make_friedman1()
    clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
    
    pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)])
    
    for c in pdd.axes_[0][1].get_children():
      if isinstance(c, Text):
        try:
           label_value = float(c.get_text())
        except ValueError:
           continue
        idx = np.argmin(abs(pdd.contours_[0][1].levels - label_value))
        c.set_text(f'{pdd.contours_[0][1].levels[idx]:g}')
    

    enter image description here

    Update 1

    The above method doesn't work if all existing labels are identical. A somewhat unreliable quick and dirty workaround would be to rely on the fact that the label texts are added to the Axes in ascending order. The first and last level are not labelled. This leads to the following example:

    import matplotlib.pyplot as plt
    from matplotlib.text import Text
    from sklearn.datasets import make_friedman1
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.inspection import PartialDependenceDisplay
    
    X, y = make_friedman1(random_state=42)
    clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
    
    pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)])
    
    i = 1
    for c in pdd.axes_[0][1].get_children():
      if isinstance(c, Text) and c.get_text():
        c.set_text(f'{pdd.contours_[0][1].levels[i]:g}')
        i += 1
    

    enter image description here

    Update 2

    Another (reliable but still hacky) possibility is to overwrite the clabel function used by Scikit with your own version that uses an appropriate format specification. In order to get hold of this function you'll have to provide your own Axes instance to PartialDependenceDisplay.from_estimator:

    import matplotlib.pyplot as plt
    from sklearn.datasets import make_friedman1
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.inspection import PartialDependenceDisplay
    
    fig, axes = plt.subplots(ncols=2)
    
    original_clabel = axes[1].clabel
    def new_clabel(CS, **kwargs):
      del kwargs['fmt']
      return original_clabel(CS, fmt='%2.5f', **kwargs)
    axes[1].clabel = new_clabel
    
    X, y = make_friedman1(random_state=42)
    clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
    
    pdd = PartialDependenceDisplay.from_estimator(clf, X, [0, (0, 1)], ax=axes)
    

    enter image description here