I was experimenting with sklearn's GridSearchCV, and I don't understand why the mean roc scores I get when using a single split defined with an iterable, are different than what I get running the score method after fitting, or the roc_auc_score function.
This is my data shape:
print(X.shape)
print(X.index)
print(y.shape)
print(y.index)
(31695, 1379)
RangeIndex(start=0, stop=31695, step=1)
(31695,)
RangeIndex(start=0, stop=31695, step=1)
This is how I define the cv_split:
cv_split =[(np.arange(15848), np.arange(15848,31695))]
cv_split
[(array([ 0, 1, 2, ..., 15845, 15846, 15847]),
array([15848, 15849, 15850, ..., 31692, 31693, 31694]))]
Fitting the model and resulting cv_results_:
gs_algorithm = GridSearchCV(estimator=LGBMClassifier(),
param_grid=hyperparameter_space,
scoring='roc_auc',
n_jobs=1,
pre_dispatch=1,
cv=cv_split,
verbose=10,
return_train_score=True)
gs_algorithm.fit(X, y)
gs_algorithm.cv_results_
Fitting 1 folds for each of 1 candidates, totalling 1 fits
...
{'mean_fit_time': array([17.40988088]),
'std_fit_time': array([0.]),
'mean_score_time': array([1.16691899]),
'std_score_time': array([0.]),
'param_colsample_bytree': masked_array(data=[0.2],
mask=[False],
fill_value='?',
dtype=object),
'param_learning_rate': masked_array(data=[0.1],
mask=[False],
fill_value='?',
dtype=object),
'param_max_depth': masked_array(data=[-1],
mask=[False],
fill_value='?',
dtype=object),
'param_min_child_samples': masked_array(data=[3000],
mask=[False],
fill_value='?',
dtype=object),
'param_min_child_weight': masked_array(data=[0],
mask=[False],
fill_value='?',
dtype=object),
'param_n_estimators': masked_array(data=[150],
mask=[False],
fill_value='?',
dtype=object),
'param_num_leaves': masked_array(data=[15000],
mask=[False],
fill_value='?',
dtype=object),
'param_random_state': masked_array(data=[6],
mask=[False],
fill_value='?',
dtype=object),
'params': [{'colsample_bytree': 0.2,
'learning_rate': 0.1,
'max_depth': -1,
'min_child_samples': 3000,
'min_child_weight': 0,
'n_estimators': 150,
'num_leaves': 15000,
'random_state': 6}],
'split0_test_score': array([0.75898716]),
'mean_test_score': array([0.75898716]),
'std_test_score': array([0.]),
'rank_test_score': array([1], dtype=int32),
'split0_train_score': array([0.81224109]),
'mean_train_score': array([0.81224109]),
'std_train_score': array([0.])}
So it's correctly giving me the same value for split0_test_score and mean_test_score: 0.75898716
But then when I try this:
gs_algorithm.score(X.iloc[cv_split[0][1]],y[cv_split[0][1]])
0.8194048788870386
y_pred = gs_algorithm.predict_proba(X)[:, 1]
print(y_pred[cv_split[0][1]].shape)
roc_auc_score(y[cv_split[0][1]], y_pred[cv_split[0][1]])
(15847,)
0.8194048788870386
Why is the mean_score informed after fitting the model different?
The score
and predict_proba
methods of GridSearchCV
(your gs_algorithm
) rely on a refitted model using the entire training set (recombining the cv split(s)); see the documentation for the parameter refit
.
Individual fold-estimator combinations aren't saved, so you would need to manually refit the estimator with the best_params_
on the training set (with random effects controlled) in order to recreate the test fold score.