python scikit-learn cross-validation xgboost

Manually replicating cross_val_score leads to strange resutls when training a toy Xgboost model

I tried to replicate the result of cross_val_score() when hyper-tuning a XGboost toy model.

I used code NO.1 to do Cross validation whose result was used as a benchmark, and then used code NO.2 and NO.3 to replicate the CV result by manually programming the cross validation loops.

The major difference between code NO.2 and code NO.3 is that I put the initialization of XGboost Classifier outside the for loop in code NO.3 but inside the for loop in code NO.2. I expected that only code NO.2 (the inside-the-loop version) generated the same result as what the automatic cross_val_score got. To my surprise, all the three versions of code share the same result.

My question is: Shouldn't we clone the model for each validation as mentioned inside the source code of cross_val_score? And in Code NO.3, the trained Xgboost models are not independent across validations, rights? Non-independence is NOT in the spirit of cross validation, isn't it? But why I got identical results from them?

Code NO.1

params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc'
}
model = XGBClassifier(**params)
kfold = StratifiedKFold(n_splits=N_SPLITS, random_state=SEED)
results = cross_val_score(model, X, Y, scoring='accuracy', cv=kfold) # only a single metric is permitted. model is cloned not relay across folds.
print(f'Accuracy: {results.mean()*100:.4f}% ({results.std()*100:.3f})')

Code NO.2

x_train = all_df.drop('Survived', axis=1).iloc[:train_rows].values 
y_train = train_label.iloc[:train_rows].values
y_oof = np.zeros(x_train.shape[0])
acc_scores = []
kfold = StratifiedKFold(n_splits=N_SPLITS, random_state=SEED)
for i, (train_index, valid_index) in enumerate(kfold.split(x_train, y_train)):
    model = XGBClassifier(**params) # <=======================================
    X_A, X_B = x_train[train_index, :], x_train[valid_index, :]
    y_A, y_B = y_train[train_index], y_train[valid_index]
    model.fit(X_A, y_A, eval_set=[(X_B, y_B)])
    y_oof[valid_index] = model.predict(X_B)
    acc_scores.append(accuracy_score(y_B, y_oof[valid_index]))

Code NO.3

x_train = all_df.drop('Survived', axis=1).iloc[:train_rows].values 
y_train = train_label.iloc[:train_rows].values
y_oof = np.zeros(x_train.shape[0])
acc_scores = []
kfold = StratifiedKFold(n_splits=N_SPLITS, random_state=SEED)
model = XGBClassifier(**params) # <=======================================
for i, (train_index, valid_index) in enumerate(kfold.split(x_train, y_train)):
    X_A, X_B = x_train[train_index, :], x_train[valid_index, :]
    y_A, y_B = y_train[train_index], y_train[valid_index]
    model.fit(X_A, y_A, eval_set=[(X_B, y_B)])
    y_oof[valid_index] = model.predict(X_B)
    acc_scores.append(accuracy_score(y_B, y_oof[valid_index]))

Solution

When you call fit on an XGBClassifier instance (or ideally, any sklearn-compatible estimator), the learning starts over from scratch, so that the models are indeed independent across validations.

Of course, re-initializing or cloning the model is slightly safer, especially if you're unsure that the implementation doesn't keep any information lying around to use. cross_val_score is a wrapper around cross_validate, and there the cloning is actually needed, in case return_estimator=True, so that the various copies of the model need to be saved.