Search code examples
scikit-learnensemble-learning

Returning standard deviation with `BaggingRegressor`


Is there a way to return standard deviation using sklearn.ensemble.BaggingRegressor?

Cause by looking at several examples all that I have found has been the mean prediction.


Solution

  • You can always get the underlying predictions by each estimator of the ensemble, which (estimator) is accessible through the estimators_ attribute of the ensemble, and handle these predictions accordingly (compute mean, standard deviation, etc).

    Adapting the example from the documentation, with an ensemble of 10 SVR base estimators:

    import numpy as np
    from sklearn.svm import SVR
    from sklearn.ensemble import BaggingRegressor
    from sklearn.datasets import make_regression
    
    X, y = make_regression(n_samples=100, n_features=4,
                           n_informative=2, n_targets=1,
                           random_state=0, shuffle=False)
    regr = BaggingRegressor(base_estimator=SVR(),
                            n_estimators=10, random_state=0).fit(X, y)
    
    
    regr.predict([[0, 0, 0, 0]]) # get (mean) prediction for a single sample, [0, 0, 0, 0]
    # array([-2.87202411])
    
    # get the predictions from each individual member of the ensemble using a list comprehension:
    
    raw_pred = [x.predict([[0, 0, 0, 0]]) for x in regr.estimators_]
    raw_pred
    # result:
    [array([-2.13003431]),
     array([-1.96224516]),
     array([-1.90429596]),
     array([-6.90647796]),
     array([-6.21360547]),
     array([-1.84318744]),
     array([1.82285686]),
     array([4.62508622]),
     array([-5.60320499]),
     array([-8.60513286])]
    
    # get the mean, and ensure that it is the same with the one returned above with the .predict method of the ensemble:
    
    np.mean(raw_pred)
    # -2.8720241079257436
    np.mean(raw_pred) == regr.predict([[0, 0, 0, 0]]) # sanity check
    # True
    
    # get the standard deviation:
    np.std(raw_pred)
    # 3.865135037828279