python machine-learning scikit-learn cross-validation

How to get out training f1 and recall scores from GridsearchCV?

My aim here is to create a pipeline to handle preprocessing and to do nested cross valiation to prevent information leak. I'm making one pipeline per model and then will compare the performances and pick the best model.

Questions:

I can get out the training accuracy score, but how do I get out training f1 and recall?
When comparing models, do I use the cross validated test scores?

#Organising data
X = df.drop('target', axis= 1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 0)


pipe_rf = Pipeline([('scl', StandardScaler()),
            ('clf', RandomForestClassifier(random_state=42))])

param_range = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

grid_params_rf = [{'clf__criterion': ['gini', 'entropy'],
        'clf__min_samples_leaf': param_range,
        'clf__max_depth': param_range,
        'clf__min_samples_split': param_range[1:]}]

gs_rf = GridSearchCV(estimator=pipe_rf,
            param_grid=grid_params_rf,
            scoring=['accuracy', 'f1', 'recall'],
            refit = 'accuracy',
            cv=10, 
            n_jobs=jobs)

gs_rf.fit(X_train, y_train)

#Get out training scores:
gs_rf.best_params_
gs_rf.best_score_   #train accuracy
                    #train f1
                    #train recall

#Find out how well it generalises by predicting using x_test and comparing predictions to y_test
y_predict = gs_rf.predict(x_test)
accuracy_score(y_test, y_predict) #test accuracy
recall_score(y_test, y_predict)  #test recall
f1_score(y_test, y_predict)  #test f1

#Evaluating the model (using this value to compare all of my different models, e.g. RF, SVM, DT)
scor = cross_validate(gs_rf, x_test, y_test, scoring=['accuracy', 'f1', 'recall'], cv=5, n_jobs = -1)

Solution

The GridSearchCV object retains the cross-validated scores for each metric for the best estimator. You can extract these scores using the cv_results_ attribute. Here's how you can access the training F1 and recall scores:

f1_train_scores = gs_rf.cv_results_['mean_train_f1']
recall_train_scores = gs_rf.cv_results_['mean_train_recall']

Yes, when comparing models, you ideally want to use cross-validated test scores, especially when your dataset isn't huge. This gives a more robust estimate of a model's generalization performance than a single train-test split. Make sure you consistently use the same scoring metric(s) for each model and interpret them correctly.