Search code examples
pythonmachine-learningscikit-learncross-validationgrid-search

Performing grid search with a predefined validation set Sklearn


This question has been asked several times before. But I get an error when following the answer

First I specify which part is the training set and the validation set as follows.

my_test_fold = []


for i in range(len(train_x)):
    my_test_fold.append(-1)

 for i in range(len(test_x)):
    my_test_fold.append(0)

And then gridsearch is performed.

from sklearn.model_selection import PredefinedSplit
param = {
 'n_estimators':[200],
 'max_depth':[5],
 'min_child_weight':[3],
 'reg_alpha':[6],
    'gamma':[0.6],
    'scale_neg_weight':[1],
    'learning_rate':[0.09]
}




gsearch1 = GridSearchCV(estimator = XGBClassifier( 
    objective= 'reg:linear', 
    seed=1), 
param_grid = param, 
scoring='roc_auc',
cv = PredefinedSplit(test_fold=my_test_fold),
verbose = 1)


gsearch1.fit(new_data_df, df_y)

But I get the following error

 object of type 'PredefinedSplit' has no len()

Solution

  • Try to replace

    cv = PredefinedSplit(test_fold=my_test_fold)
    

    with

    cv = list(PredefinedSplit(test_fold=my_test_fold).split(new_data_df, df_y))
    

    The reason is that you may need to apply the split method to actually get the split into training and testing (and then transform it from an iterable object to a list object).