Search code examples
pythonconfusion-matrix

Calculate accuracy, precision, recall, f1 score on K-Fold Cross Validation


This is my code in python to calculate accuracy, precision, recall, and f1 score on K-Fold Cross Validation.

Here in my code I sum up every of my accuracy, recall, and so on. Then I divide it with n_folds. But I don't know if my formula is accurate to calculate those scores. How can I tell?

a=0
p=0
r=0
f=0
for fold in range(0, n_folds):
    # splitting the dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =int(len(y)/n_folds))
    
    clf.fit(X_train, y_train)
    
    x_test_prediction = clf.predict(X_test)
    
    a=a+accuracy_score(x_test_prediction, y_test)
    p=p+precision_score(x_test_prediction, y_test)
    r=r+recall_score(x_test_prediction, y_test)
    f=f+f1_score(x_test_prediction, y_test)
accuracy_score=a
precision_score=p
recall_score=r
f1_score=f
print("accuracy score :",(accuracy_score)/n_folds)
print("precision score :",precision_score/n_folds)
print("recall score :",recall_score/n_folds)
print("f1 score :",f1_score/n_folds)

Solution

  • There is a function to handle cross validation for you: cross_validate. However, your method seems correct.

    Note that it is not a good idea to use your entire data set to build your model. You can check the documentation about evaluate estimator performance:

    import pandas as pd
    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split, cross_validate
    
    n_folds = 5
    
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    clf = LogisticRegression(max_iter=1000, random_state=42)
    
    scoring = ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro']
    scores = cross_validate(clf, X_train, y_train, cv=n_folds, scoring=scoring, return_train_score=True)
    df_scores = pd.DataFrame(scores)
    

    Output:

    >>> df_scores
       fit_time  score_time  test_accuracy  train_accuracy  test_precision_macro  train_precision_macro  test_recall_macro  train_recall_macro  test_f1_macro  train_f1_macro
    0  0.012872    0.004308       1.000000        0.958333              1.000000               0.959477           1.000000            0.958333       1.000000        0.958293
    1  0.009851    0.004276       1.000000        0.968750              1.000000               0.969281           1.000000            0.968394       1.000000        0.968681
    2  0.009777    0.003775       0.875000        1.000000              0.909091               1.000000           0.875000            1.000000       0.870445        1.000000
    3  0.009764    0.004038       1.000000        0.979167              1.000000               0.979798           1.000000            0.979798       1.000000        0.979167
    4  0.010602    0.003765       0.958333        0.968750              0.962963               0.968750           0.958333            0.969045       0.958170        0.968742
    

    Check other predefined scoring values