python numpy machine-learning scikit-learn weighted-average

Inverse Weighted Average F1-Score

I am dealing with a binary classification problem (class 0/1) with class imbalance. Given the vector of predictions, I would like to compute:

F1-Score for class 0
F1-Score for class 1
Weighted Average F1-Score
Inverse Weighted Average F1-Score

The last one is a custom metric that I think could be useful: it works exactly like Weighted Average F1-Score (weighted average of all per-class F1-Scores) but instead of using weights that are proportional to the support (number of actual occurrences of a class), it uses weights that are inversely proportional (reciprocal) to the support. The idea is to give more importance to the F1-Score for the minority class, as I am interested in predicting well occurrences belonging to this class.

First question: is it a good idea to use this custom metric? Is there any downside? Why I can't find anything about it on Internet?

Second question: how to implement it in Python?

This is what I tried so far:

from sklearn.metrics import f1_score
import numpy as np

y_true = [0, 0, 1, 1, 1, 1, 1, 1]
y_pred = [0, 0, 1, 1, 0, 0, 1, 1]

f1_class0 = f1_score(y_true, y_pred, pos_label=0)
f1_class1 = f1_score(y_true, y_pred, pos_label=1)

f1_weighted = f1_score(y_true, y_pred, average = 'weighted')

class_counts = np.bincount(y_true)
class_weights = class_counts / len(y_true)
inverse_class_weights = 1 - class_weights
inverse_sample_weights = np.array([inverse_class_weights[label] for label in y_true])
f1_inverse = f1_score(y_true, y_pred, average = 'weighted', sample_weight = inverse_sample_weights)

print("F1-Score for class 0:", f1_class0)
print("F1-Score for class 1:", f1_class1)
print("Weighted Average F1-Score:", f1_weighted)
print("Inverse Weighted Average F1-Score:", f1_inverse)

output:

F1-Score for class 0: 0.6666666666666666
F1-Score for class 1: 0.8
Weighted Average F1-Score: 0.7666666666666667
Inverse Weighted Average F1-Score: 0.8285714285714286

The first three metrics are correctly computed, but the custom metric is not: I would expect a value of (0.6666666666666666 * 0.75) + (0.8 * 0.25) = 0.7 since support proportion is 0.25 for class 0 and 0.75 for class 1 (consequently "inverse support proportion" are respectively 0.75 and 0.25), while I don't understand how the value 0.8285714285714286 comes from.

Can someone please help me to understand what is going on? Did I make some mistakes? And above all, why did no one ever develop this metric?

Solution

With the sample_weight parameter, you are weighting the instances. This is different from weighting the scoresfor class 1 and class 0, which is what you want to do:

class_counts = np.bincount(y_true)
inverse_class_proportions = 1 / class_counts
inverse_class_weights = inverse_class_proportions / sum(inverse_class_proportions)

f1_inverse = f1_class0 * inverse_class_weights[0] + f1_class1 * inverse_class_weights[1]

print("Inverse Weighted Average F1-Score:", f1_inverse)

Output:

Inverse Weighted Average F1-Score: 0.7

It is not that common, because it gives overproportional weight to the minority class in case the classes are extremely imbalanced.