Search code examples
machine-learningdata-sciencexgboosthyperopt

XGBoost using Hyperopt. Facing issues while Hyper-Parameter Tuning


I am trying to Hyper-Parameter Tune XGBoostClassifier using Hyperopt. But I am facing a error. Please find below the code that I am using and the error as well:-

Step_1: Objective Function

import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
    """Objective function for XGBoost Hyperparameter Optimization"""
    # Keep track of evals
    global ITERATION
    ITERATION += 1
#     # Retrieve the subsample if present otherwise set to 1.0
#     subsample = params['boosting_type'].get('subsample', 1.0)
#     # Extract the boosting type
#     params['boosting_type'] = params['boosting_type']['boosting_type']
#     params['subsample'] = subsample
    # Make sure parameters that need to be integers are integers
    for parameter_name in ['max_depth', 'colsample_bytree', 
                          'min_child_weight']:
        params[parameter_name] = int(params[parameter_name])
    start = timer()
    # Perform n_folds cross validation
    cv_results = xgb.cv(params, train_set, num_boost_round = 10000, 
                       nfold = n_folds, early_stopping_rounds = 100, 
                       metrics = 'auc', seed = 50)
    run_time = timer() - start
    # Extract the best score
    best_score = np.max(cv_results['auc-mean'])
    # Loss must be minimized
    loss = 1 - best_score
    # Boosting rounds that returned the highest cv score
    n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
    # Write to the csv file ('a' means append)
    of_connection = open(out_file, 'a')
    writer = csv.writer(of_connection)
    writer.writerow([loss, params, ITERATION, n_estimators, 
                   run_time])
    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'iteration': ITERATION,
           'estimators': n_estimators, 'train_time': run_time, 
           'status': STATUS_OK}

I have defined the sample space and the optimization algorithm as well. While running Hyperopt, I am encountering this error below. The error is in the objective function.

Error:KeyError: 'auc-mean'

<ipython-input-62-8d4e97f16929> in objective(params, n_folds)
     25     run_time = timer() - start
     26     # Extract the best score
---> 27     best_score = np.max(cv_results['auc-mean'])
     28     # Loss must be minimized
     29     loss = 1 - best_score

Solution

  • First, print cv_results and see which key exists.

    In the below example notebook the keys were : 'test-auc-mean' and 'train-auc-mean'

    See cell 5 here: https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters