Search code examples
pythonartificial-intelligenceknnvalueerror

ValueError in kNN metrics


I have a project that consists of utilizing the kNN algorithm in a csv file and show selected metrics. But when I try to present some metrics it throws a few errors.

When trying to use: sensitivity, f1_Score and Precision:

  1. sensitivity - print(metrics.recall_score(y_test, y_pred_class))
  2. F1_score - print(metrics.f1_score(y_test, y_pred_class))
  3. Presicion - print(metrics.precision_score(y_test, y_pred_class))

Pycharm throws the following error:

ValueError: Target is multiclass but average='binary'. Please choose another average setting

The error when trying to print the ROC curve's a little different:

ValueError: multiclass format is not supported


DATASET

DATASET:

LINK TO DATASET: https://www.dropbox.com/s/yt3n1eqxlsb816n/Testfile%20-%20kNN.csv?dl=0

Program

import matplotlib
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.dviread import Text

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

#Tools para teste
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

def main():
    dataset = pd.read_csv('filetestKNN.csv')

    X = dataset.drop(columns=['Label'])
    y = dataset['Label'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y,     random_state=0, test_size=0.34)

    Classifier = KNeighborsClassifier(n_neighbors=2, p=2, metric='euclidean')
    Classifier.fit(X_train, y_train)

    y_pred_class = Classifier.predict(X_test)
    y_pred_prob = Classifier.predict_proba(X_test)[:, 1]

    accuracy = Classifier.score(X_test, y_test)

    confusion = metrics.confusion_matrix(y_test, y_pred_class)

    print()
    print("Accuracy")
    print(metrics.accuracy_score(y_test, y_pred_class))
    print()
    print("Classification Error")
    print(1 - metrics.accuracy_score(y_test, y_pred_class))
    print()
    print("Confusion matrix")
    print(metrics.confusion_matrix(y_test, y_pred_class))
    #error
    print(metrics.recall_score(y_test, y_pred_class))
    #error
    print(metrics.roc_curve(y_test, y_pred_class))
    #error
    print(metrics.f1_score(y_test, y_pred_class))
    #error
    print(metrics.precision_score(y_test, y_pred_class))

I just wanted to show the algorithm metrics on the screen.


Solution

  • You need to set the average keyword argument to these sklearn.metrics functions. For an example, look at the documentation of f1_score. Here is the part corresponding to the average keyword arg:

    average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

    'binary':
      Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
    'micro':
      Calculate metrics globally by counting the total true positives, false negatives and false positives.
    'macro':
      Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    'weighted':
      Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
    

    This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

    Here we can see that this describes how results are aggregated over the different labels on your multiclass task. I'm not sure which one you'd like to use, but micro seems nice. Here's how your call to f1_score would look with this choice:

    print(metrics.f1_score(y_test, y_pred_class, average='micro'))
    

    You can adjust the other metrics similarly. Hope this helps.