Search code examples
pythonxgboost

Error while running gridsearchcv for xgboost hyperparameters in python



I have a set of data as follows, extracted the first 5 rows for reference

    gvkey   year    ebit_diff   cogs_diff   revt_diff   xad_diff    xint_diff   xrd_diff    xrent_diff  xsga_diff
0   1004    2011    0.007816    0.081074    0.051726    -0.02617    0.011864    -0.052201   -0.016440   -0.048060
1   1004    2012    -0.028573   0.032022    0.002105    -0.02617    0.035253    -0.052201   -0.024924   -0.050444
2   1004    2013    -0.039717   -0.080926   -0.079771   -0.02617    0.011793    -0.052201   -0.009906   -0.050436
3   1004    2014    -0.027915   -0.184351   -0.169031   -0.02617    -0.012772   -0.052201   -0.032912   -0.094717
4   1004    2015    -0.185687   -0.243326   -0.291618   -0.02617    -0.126708   -0.052201   -0.059853   -0.126411


I am not required to include 'gvkey' and 'year' categorical variable. I have done a train_test_split and is running xgboost and using gridsearchcv to determine the optimal hyperparameters

X_train:

    cogs_diff   revt_diff   xad_diff    xint_diff   xrd_diff    xrent_diff  xsga_diff
0   0.081074    0.051726    -0.02617    0.011864    -0.052201   -0.016440   -0.048060
1   0.032022    0.002105    -0.02617    0.035253    -0.052201   -0.024924   -0.050444
2   -0.080926   -0.079771   -0.02617    0.011793    -0.052201   -0.009906   -0.050436
3   -0.184351   -0.169031   -0.02617    -0.012772   -0.052201   -0.032912   -0.094717
4   -0.243326   -0.291618   -0.02617    -0.126708   -0.052201   -0.059853   -0.126411

cogs_diff     float64
revt_diff     float64
xad_diff      float64
xint_diff     float64
xrd_diff      float64
xrent_diff    float64
xsga_diff     float64
dtype: object


Y train

0    0.007816
1   -0.028573
2   -0.039717
3   -0.027915
4   -0.185687
Name: ebit_diff, dtype: float64

```
# 1. Set up a parameter grid for XGBoost

params = {
     "max_depth": [2, 4, 6],
     "learning_rate": [0.001, 0.05, 0.1],
     "n_estimators": [20,40,60],
     "max_features": [2,4,6]
}


# 2. Set up xgboost classifier - so that the performance metric is RMSE, not something else
xgb = XGBClassifier(eval_metric ='rmse')


# 3. Set up GridSearchCV parameters - perform 5-fold cross validation for hyperparameter tuning on this training dataset set.

start_time = time.time()

grid = GridSearchCV(estimator=xgb, param_grid=params, cv=5, scoring='roc_auc', verbose=3)
grid.fit(X_train, y_train)


However, i encounter the following error:

ValueError: continuous format is not supported


Will like to check if anyone know how to resolve this issue?


Solution

  • It seems that your problem is a regression problem, but you instantiated an XGBClassifier (with eval score rmse) where probably an XGBRegressor should be used.

    In addition, you are using scoring='roc_auc' in the construction of GridSearchCV which probably leads to the exception, as the area under the roc curve is not an appropriate score to use in an regression problem (that's probably what's the exception message is trying to tell you).