I have a set of data as follows, extracted the first 5 rows for reference
gvkey year ebit_diff cogs_diff revt_diff xad_diff xint_diff xrd_diff xrent_diff xsga_diff
0 1004 2011 0.007816 0.081074 0.051726 -0.02617 0.011864 -0.052201 -0.016440 -0.048060
1 1004 2012 -0.028573 0.032022 0.002105 -0.02617 0.035253 -0.052201 -0.024924 -0.050444
2 1004 2013 -0.039717 -0.080926 -0.079771 -0.02617 0.011793 -0.052201 -0.009906 -0.050436
3 1004 2014 -0.027915 -0.184351 -0.169031 -0.02617 -0.012772 -0.052201 -0.032912 -0.094717
4 1004 2015 -0.185687 -0.243326 -0.291618 -0.02617 -0.126708 -0.052201 -0.059853 -0.126411
I am not required to include 'gvkey' and 'year' categorical variable. I have done a train_test_split and is running xgboost and using gridsearchcv to determine the optimal hyperparameters
X_train:
cogs_diff revt_diff xad_diff xint_diff xrd_diff xrent_diff xsga_diff
0 0.081074 0.051726 -0.02617 0.011864 -0.052201 -0.016440 -0.048060
1 0.032022 0.002105 -0.02617 0.035253 -0.052201 -0.024924 -0.050444
2 -0.080926 -0.079771 -0.02617 0.011793 -0.052201 -0.009906 -0.050436
3 -0.184351 -0.169031 -0.02617 -0.012772 -0.052201 -0.032912 -0.094717
4 -0.243326 -0.291618 -0.02617 -0.126708 -0.052201 -0.059853 -0.126411
cogs_diff float64
revt_diff float64
xad_diff float64
xint_diff float64
xrd_diff float64
xrent_diff float64
xsga_diff float64
dtype: object
Y train
0 0.007816
1 -0.028573
2 -0.039717
3 -0.027915
4 -0.185687
Name: ebit_diff, dtype: float64
```
# 1. Set up a parameter grid for XGBoost
params = {
"max_depth": [2, 4, 6],
"learning_rate": [0.001, 0.05, 0.1],
"n_estimators": [20,40,60],
"max_features": [2,4,6]
}
# 2. Set up xgboost classifier - so that the performance metric is RMSE, not something else
xgb = XGBClassifier(eval_metric ='rmse')
# 3. Set up GridSearchCV parameters - perform 5-fold cross validation for hyperparameter tuning on this training dataset set.
start_time = time.time()
grid = GridSearchCV(estimator=xgb, param_grid=params, cv=5, scoring='roc_auc', verbose=3)
grid.fit(X_train, y_train)
However, i encounter the following error:
ValueError: continuous format is not supported
Will like to check if anyone know how to resolve this issue?
It seems that your problem is a regression problem, but you instantiated an XGBClassifier
(with eval score rmse
) where probably an XGBRegressor
should be used.
In addition, you are using scoring='roc_auc'
in the construction of GridSearchCV
which probably leads to the exception, as the area under the roc curve is not an appropriate score to use in an regression problem (that's probably what's the exception message is trying to tell you).