I am trying to compare different regression stategies for a forecasting problem:
The documentation of scikit for the multiple input output wrappers is actually not that good but it is mentioned that:
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as Pipeline).
The latter have parameters of the form <component>__<parameter> so that it’s possible to
update each component of a nested object.
Therefore I am building my pipeline as:
pipeline_xgboost = Pipeline([('scaler', StandardScaler()),
('variance_selector', VarianceThreshold(threshold=0.03)),
('estimator', xgb.XGBRegressor())])
And then creating the wrapper as:
mimo_wrapper = MultiOutputRegressor(pipeline_xgboost)
Following the documentation of scikit pipelines I am defining my xgboost parameters as:
parameters = [
'estimator__reg_alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
'estimator__max_depth': [10, 100, 1000]
And then I am running my cross validation as:
randomized_search = RandomizedSearchCV(mimo_wrapper, perparameters, random_state=0, n_iter=5,
n_jobs=-1, refit=True, cv=3, verbose=True,
pre_dispatch='2*n_jobs', error_score='raise',
However I am getting the following issue:
ValueError: Invalid parameter reg_alpha for estimator Pipeline(steps=[('scaler', StandardScaler()),
('variance_selector', VarianceThreshold(threshold=0.03)),
XGBRegressor(base_score=None, booster=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, gamma=None, gpu_id=None,
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
random_state=None, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
validate_parameters=None, verbosity=None))]). Check the list of available parameters with `estimator.get_params().keys()`.
Did I missunderstood the documentation of scikit? I have also tried with setting the parameters as estimator__estimator__param as maybe this is the way to access the parameters when they are in the mimo_wrapper but this as proved unsuccesfull. (Example below):
parameters = {
'estimator__estimator__reg_alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
'estimator__estimator__max_depth': [10, 100, 1000]
random_grid = RandomizedSearchCV(estimator=pipeline_xgboost, param_distributions=parameters,random_state=0, n_iter=5,
n_jobs=-1, refit=True, cv=3, verbose=True,
pre_dispatch='2*n_jobs', error_score='raise',
hyperparameters_tuning = random_grid.fit(df.drop(columns=TARGETS+UMAPS),
Funny enough I have noticed that when setting the estimator parameters outside the random search function this works well:
parameters = dict({
'estimator__max_depth': [10, 100, 1000]
And as you can see the max_depth is now changed.
Pipeline(steps=[('scaler', StandardScaler()),
('variance_selector', VarianceThreshold(threshold=0.03)),
XGBRegressor(base_score=None, booster=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, gamma=None, gpu_id=None,
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=200,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
random_state=None, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
validate_parameters=None, verbosity=None))])
Dear colleagues it seems that this was due to a problem in XGB.Regressor in any case the right way of creating parameters for the MultiOutput Regressor within a pipeline it would be:
parameters = {
'estimator__estimator__reg_alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100],
'estimator__estimator__max_depth': [10, 100, 1000]