Search code examples
machine-learningscikit-learnhyperparametersoptuna

Why optuna stuck at trial 2(trial_id=3) after it has calculated all hyperparameters?


I am using optuna to tune xgboost model's hyperparameters. I find it stuck at trial 2 (trial_id=3) for a long time(244 minutes). But When I look at the SQLite database which records the trial data, I find all the trial 2 (trial_id=3) hyperparameters has been calculated except the mean squared error value of trial 2. And the optuna trial 2 (trial_id=3) seems stuck at that step. I want to know why this happened? And how to fix the issue?

Here is the code

def xgb_hyperparameter_tuning(): 
    def objective(trial):
        params = {
            "n_estimators": trial.suggest_int("n_estimators", 1000, 10000, step=100),
            "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]), 
            "max_depth": trial.suggest_int("max_depth", 1, 20, step=1),
            "learning_rate": trial.suggest_float("learning_rate", 0.0001, 0.2, step=0.001),
            "min_child_weight": trial.suggest_float("min_child_weight", 1.0, 20.0, step=1.0),
            "colsample_bytree": trial.suggest_float("colsample_bytree", 0.1, 1.0, step=0.1),
            "subsample": trial.suggest_float("subsample",0.1, 1.0, step=0.1),
            "reg_alpha": trial.suggest_float("reg_alpha", 0.0, 11.0, step=0.1),        
            "reg_lambda": trial.suggest_float("reg_lambda", 0.0, 11.0, step=0.1),
            "num_parallel_tree": 10,
            "random_state": 16,
            "n_jobs": 10,
            "early_stopping_rounds": 1000,
        }

        model = XGBRegressor(**params)
        mse = make_scorer(mean_squared_error)
        cv = cross_val_score(estimator=model, X=X_train, y=log_y_train, cv=20, scoring=mse, n_jobs=-1)
        return cv.mean()

    study = optuna.create_study(study_name="HousePriceCompetitionXGB", direction="minimize", storage="sqlite:///house_price_competition_xgb.db", load_if_exists=True)
    study.optimize(objective, n_trials=100,)
    return None

xgb_hyperparameter_tuning()

Here is the output

[I 2021-11-16 10:06:27,522] A new study created in RDB with name: HousePriceCompetitionXGB
[I 2021-11-16 10:08:40,050] Trial 0 finished with value: 0.03599314763859092 and parameters: {'n_estimators': 5800, 'booster': 'gblinear', 'max_depth': 4, 'learning_rate': 0.1641, 'min_child_weight': 17.0, 'colsample_bytree': 0.4, 'subsample': 0.30000000000000004, 'reg_alpha': 10.8, 'reg_lambda': 7.6000000000000005}. Best is trial 0 with value: 0.03599314763859092.
[I 2021-11-16 10:11:55,830] Trial 1 finished with value: 0.028514652199592445 and parameters: {'n_estimators': 6600, 'booster': 'gblinear', 'max_depth': 17, 'learning_rate': 0.0821, 'min_child_weight': 20.0, 'colsample_bytree': 0.7000000000000001, 'subsample': 0.2, 'reg_alpha': 1.2000000000000002, 'reg_lambda': 7.2}. Best is trial 1 with value: 0.028514652199592445.

Here is the sqlite database trial_values table's data

trial_value_id trial_id objective value
1 1 0 0.0359931476385909
2 2 0 0.0285146521995924

Here is the sqlite database trial_params table's data And you can see all the trial 2 (trial_id=3) hyperparameters has been calculated

param_id trial_id param_name param_value distribution_json
1 1 n_estimators 5800.0 {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
2 1 booster 1.0 {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
3 1 max_depth 4.0 {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
4 1 learning_rate 0.1641 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
5 1 min_child_weight 17.0 {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
6 1 colsample_bytree 0.4 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
7 1 subsample 0.3 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
8 1 reg_alpha 10.8 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
9 1 reg_lambda 7.6 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
10 2 n_estimators 6600.0 {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
11 2 booster 1.0 {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
12 2 max_depth 17.0 {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
13 2 learning_rate 0.0821 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
14 2 min_child_weight 20.0 {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
15 2 colsample_bytree 0.7 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
16 2 subsample 0.2 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
17 2 reg_alpha 1.2 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
18 2 reg_lambda 7.2 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
19 3 n_estimators 7700.0 {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
20 3 booster 2.0 {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
21 3 max_depth 4.0 {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
22 3 learning_rate 0.1221 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
23 3 min_child_weight 3.0 {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
24 3 colsample_bytree 0.5 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
25 3 subsample 0.1 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
26 3 reg_alpha 10.8 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
27 3 reg_lambda 1.1 {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}

Solution

  • Although I am not 100% sure, I think I know what happened.

    This issue happens because some parameters are not suitable for certain booster type and the trial will return nan as result and be stuck at the step - calculating the MSE score.

    To solve the problem, you just need to delete the "booster": "dart".

    In other words, using "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear"]), rather than "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]), can solve the problem.

    I got the idea when I tuned my LightGBMRegressor Model. I found many trials fail because these trials returned nan and they all used the same "boosting_type"="rf". So I deleted the rf and all 100 trials were completed without any error. Then I looked for the XGBRegressor issue which I posted above. I found all the trials which were stuck had the same "booster":"dart" either. So I deleted the dart, and the XGBRegressor run normally.