I am dealing with a binary classification problem (class 0/1) with class imbalance. Given the vector of predictions, I would like to compute:
The last one is a custom metric that I think could be useful: it works exactly like Weighted Average F1-Score (weighted average of all per-class F1-Scores) but instead of using weights that are proportional to the support (number of actual occurrences of a class), it uses weights that are inversely proportional (reciprocal) to the support. The idea is to give more importance to the F1-Score for the minority class, as I am interested in predicting well occurrences belonging to this class.
First question: is it a good idea to use this custom metric? Is there any downside? Why I can't find anything about it on Internet?
Second question: how to implement it in Python?
This is what I tried so far:
from sklearn.metrics import f1_score
import numpy as np
y_true = [0, 0, 1, 1, 1, 1, 1, 1]
y_pred = [0, 0, 1, 1, 0, 0, 1, 1]
f1_class0 = f1_score(y_true, y_pred, pos_label=0)
f1_class1 = f1_score(y_true, y_pred, pos_label=1)
f1_weighted = f1_score(y_true, y_pred, average = 'weighted')
class_counts = np.bincount(y_true)
class_weights = class_counts / len(y_true)
inverse_class_weights = 1 - class_weights
inverse_sample_weights = np.array([inverse_class_weights[label] for label in y_true])
f1_inverse = f1_score(y_true, y_pred, average = 'weighted', sample_weight = inverse_sample_weights)
print("F1-Score for class 0:", f1_class0)
print("F1-Score for class 1:", f1_class1)
print("Weighted Average F1-Score:", f1_weighted)
print("Inverse Weighted Average F1-Score:", f1_inverse)
output:
F1-Score for class 0: 0.6666666666666666
F1-Score for class 1: 0.8
Weighted Average F1-Score: 0.7666666666666667
Inverse Weighted Average F1-Score: 0.8285714285714286
The first three metrics are correctly computed, but the custom metric is not:
I would expect a value of (0.6666666666666666 * 0.75) + (0.8 * 0.25) = 0.7
since support proportion is 0.25 for class 0 and 0.75 for class 1 (consequently "inverse support proportion" are respectively 0.75 and 0.25), while I don't understand how the value 0.8285714285714286
comes from.
Can someone please help me to understand what is going on? Did I make some mistakes? And above all, why did no one ever develop this metric?
With the sample_weight
parameter, you are weighting the instances. This is different from weighting the scoresfor class 1 and class 0, which is what you want to do:
class_counts = np.bincount(y_true)
inverse_class_proportions = 1 / class_counts
inverse_class_weights = inverse_class_proportions / sum(inverse_class_proportions)
f1_inverse = f1_class0 * inverse_class_weights[0] + f1_class1 * inverse_class_weights[1]
print("Inverse Weighted Average F1-Score:", f1_inverse)
Output:
Inverse Weighted Average F1-Score: 0.7
It is not that common, because it gives overproportional weight to the minority class in case the classes are extremely imbalanced.