Is there a way to return standard deviation using sklearn.ensemble.BaggingRegressor
?
Cause by looking at several examples all that I have found has been the mean prediction.
You can always get the underlying predictions by each estimator of the ensemble, which (estimator) is accessible through the estimators_
attribute of the ensemble, and handle these predictions accordingly (compute mean, standard deviation, etc).
Adapting the example from the documentation, with an ensemble of 10 SVR base estimators:
import numpy as np
from sklearn.svm import SVR
from sklearn.ensemble import BaggingRegressor
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4,
n_informative=2, n_targets=1,
random_state=0, shuffle=False)
regr = BaggingRegressor(base_estimator=SVR(),
n_estimators=10, random_state=0).fit(X, y)
regr.predict([[0, 0, 0, 0]]) # get (mean) prediction for a single sample, [0, 0, 0, 0]
# array([-2.87202411])
# get the predictions from each individual member of the ensemble using a list comprehension:
raw_pred = [x.predict([[0, 0, 0, 0]]) for x in regr.estimators_]
raw_pred
# result:
[array([-2.13003431]),
array([-1.96224516]),
array([-1.90429596]),
array([-6.90647796]),
array([-6.21360547]),
array([-1.84318744]),
array([1.82285686]),
array([4.62508622]),
array([-5.60320499]),
array([-8.60513286])]
# get the mean, and ensure that it is the same with the one returned above with the .predict method of the ensemble:
np.mean(raw_pred)
# -2.8720241079257436
np.mean(raw_pred) == regr.predict([[0, 0, 0, 0]]) # sanity check
# True
# get the standard deviation:
np.std(raw_pred)
# 3.865135037828279