the xgboost.XGBRegressor
seems to produce the same results despite the fact a new random seed is given.
According to the xgboost
documentation xgboost.XGBRegressor
:
seed : int Random number seed. (Deprecated, please use random_state)
random_state : int Random number seed. (replaces seed)
random_state
is the one to be used, however, no matter what random_state
or seed
I use, the model produce the same results. A Bug?
from xgboost import XGBRegressor
from sklearn.datasets import load_boston
import numpy as np
from itertools import product
def xgb_train_predict(random_state=0, seed=None):
X, y = load_boston(return_X_y=True)
xgb = XGBRegressor(random_state=random_state, seed=seed)
xgb.fit(X, y)
y_ = xgb.predict(X)
return y_
check = xgb_train_predict()
random_state = [1, 42, 58, 69, 72]
seed = [None, 2, 24, 85, 96]
for r, s in product(random_state, seed):
y_ = xgb_train_predict(r, s)
assert np.equal(y_, check).all()
print('CHECK! \t random_state: {} \t seed: {}'.format(r, s))
[Out]:
CHECK! random_state: 1 seed: None
CHECK! random_state: 1 seed: 2
CHECK! random_state: 1 seed: 24
CHECK! random_state: 1 seed: 85
CHECK! random_state: 1 seed: 96
CHECK! random_state: 42 seed: None
CHECK! random_state: 42 seed: 2
CHECK! random_state: 42 seed: 24
CHECK! random_state: 42 seed: 85
CHECK! random_state: 42 seed: 96
CHECK! random_state: 58 seed: None
CHECK! random_state: 58 seed: 2
CHECK! random_state: 58 seed: 24
CHECK! random_state: 58 seed: 85
CHECK! random_state: 58 seed: 96
CHECK! random_state: 69 seed: None
CHECK! random_state: 69 seed: 2
CHECK! random_state: 69 seed: 24
CHECK! random_state: 69 seed: 85
CHECK! random_state: 69 seed: 96
CHECK! random_state: 72 seed: None
CHECK! random_state: 72 seed: 2
CHECK! random_state: 72 seed: 24
CHECK! random_state: 72 seed: 85
CHECK! random_state: 72 seed: 96
It seems (I didn't know it myself before starting to dig for an answer :) ), that xgboost uses random generator only for sub-sampling, see this Laurae's comment on a similar github issue. And otherwise behavior is deterministic.
If you would have used sampling, there is an issue in the seed
/random_state
handling by the current sklearn API in xgboost. seed
is indeed claimed to be deprecated, but it seems that if one provides it, it will still be used over random_state
, as can be seen here in the code. This comment is relevant only when you have seed not None