I am training a XGBoost model and doing hyper-parameter tuning using randomizedSearchCV. I specify the parameter distribution as:
from xgboost import XGBRegressor
# Define a xgboost regression model
model = XGBRegressor()
params = {
"colsample_bytree": uniform(0.1, 0.2), # fraction of cols to sample
"gamma": uniform(0, 0.3), # min loss reduction required for next split
"learning_rate": uniform(0.02, 0.3), # default 0.1
"n_estimators": randint(100, 150), # default 100
"subsample": uniform(0.8, 0.75) # % of rows to use in training sample
}
r = RandomizedSearchCV(model, param_distributions=params, n_iter=100,
scoring="neg_mean_absolute_error", cv=3, n_jobs=1)
I get the following error when even though the range I have specified for subsample
is lower the bound [0,1].
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: value 1.10671 for Parameter subsample exceed bound [0,1]
warnings.warn("Estimator fit failed. The score on this train-test"
Any ideas why this could be happening?
I think the issue comes from:
uniform(0.8, 0.75)
For numpy and random the first value of the function defines the lower limit and the second value the upper limit. Hence, for numpy and random you want:
uniform(0.75, 0.8)
This applies for numpy.random.uniform and random.uniform:
However, for scipy.stats.uniform the definition is slightly different. Which is "Using the parameters loc and scale, one obtains the uniform distribution on [loc, loc + scale]." So for scipy you want:
uniform(0.75, 0.05)