Search code examples
pythonxgboostgridsearchcv

xgboost and gridsearchcv in python


I have question about this tutorial.

The author is doing hyper parameter tuning. The first window shows different values of hyperparameters

enter image description here

Then he initializes gridsearchcv and mentions cv=3 and scoring='roc_auc' enter image description here

then he fits gridsearchcv and uses eval_set and eval_metric='auc' enter image description here

  1. what is the purpose using cv and eval_set both? shouldn't we use just one of them? how they are used along with scoring='roc_auc' and eval_metric='auc'
  2. is there a better way to do hyper parameter tuning using gridsearchcv? please suggest or provide a link

Solution

    1. GridSearchCV performs cv for hyperparameter tuning using only training data. Since refit=True by default, the best fit is then validated on the eval set provided (a true test score). You can use any metric to perform cv and testing. However, it would be odd to use a different metric for cv hyperparameter optimization and testing phases. So, the same metric is used. If you are wondering about the slightly different metric naming, I think it's just because xgboost is a sklearn-interface-compliant package, but it's not being developed by the same guys from sklearn. They should do both the same thing (area under the curve of receiving operator for predictions). Take a look at the sklearn docs: auc and roc_auc.

    2. I don't think there is a better way.