Hi I am relatively new to python and AI and I was trying to explain my f1_scores and I realized that If I calculate my f1 score manually using F1 = 2TP / (2TP + FP + FN) based on my confusion matrix, It is different with what sklearn.metrics returns me.
This is my code
dataset = pd.read_csv('diabetes-data.csv')
zero_not_accepted = ['Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Insulin']
for column in zero_not_accepted:
dataset[column] = dataset[column].replace(0, np.NaN)
mean = int(dataset[column].mean(skipna=True))
dataset[column] = dataset[column].replace(np.NaN, mean)
X = dataset.iloc[:, 0:8]
y = dataset.iloc[:, 8]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
print(X_test)
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
classifier = KNeighborsClassifier(n_neighbors=11, p=2, metric="euclidean")
import math
math.sqrt(len(y_test))
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
My final confusion matrix is [[94 13] [15 32]]
This is where it get confusing, if I calculate the F1 score manually, I get 0.8704. However, in python it returned me 0.6956 using f1_score(y_test, y_pred). Can anyone please explain to me what was the issues?
Additional information: I tried to print the classification_report(y_test, y_pred)) and this is the output: *
Classification Report:
precision recall f1-score support
0 0.86 0.88 0.87 107
1 0.71 0.68 0.70 47
accuracy 0.82 154
macro avg 0.79 0.78 0.78 154
weighted avg 0.82 0.82 0.82 154
Scikit numbers order in the confusion matrix are not the same as the order you expect / have in your books/lecture.
For scikit learn order of numbers in the matrix is :
TN FN
FP TP
So F1 = 2TP / (2TP + FP + FN)
F1 = 2*32 / (2*32 + 15 + 13)
F1 = 0.6956
is the good answer.
You did the calculs as the matrix numbers were ordered :
TP FP
FN TN
F1 = 2*94 / 2*94+13+15
F1 = 0.8703
Which is wrong as scikit matrix numbers are not in this order.