Search code examples
pythonscikit-learngrid-search

sklearn Crossvalidation with GridSearch


The question is: when I do the fit with a GridSearch, I need to do something to have a crossvalidation process?

Thing that I know:

1.- I can set a scoring function (But is not so clear how to do that, at least to me)

2.- If I do not pass a 'cv' param to the GridSearch class then it use the default 3-fold cross validation.

What exactly do GridSearch with the data that I pass?, all data is used in the train or is splitted inside in a train and test data?

Thanks!

PD: seems that my classifier have overfitting because have a score of 100% but do not have good results with new data.


Solution

  • Please have a look at GridSearchCV documentation. It describes everything you want in detail.

    GridSearch will train the given estimator over all given parameters values and finds the parameters which give the highest (or lowest, if a loss function is used) score on the train data.

    GridSearchCV will do the same thing with Cross-validation internally. Parameters for estimators can be supplied in GridSearchCV with param_grid argument.

    For your queries:

    1. Scoring - You can pass any string available on this page (depending on your classifier). Or you can pass your own custom scorer with make_scorer.
    2. CV - Same for cv. Either you can pass a number for that many folds cross-validation, or a cv object. You can check available cv iterators at this page.