python python-3.x machine-learning scikit-learn metrics

Macro VS Micro VS Weighted VS Samples F1 Score

In sklearn.metrics.f1_score, the f1 score has a parameter called "average". What does macro, micro, weighted, and samples mean? Please elaborate, because in the documentation, it was not explained properly. Or simply answer the following:

Why is "samples" best parameter for multilabel classification?
Why is micro best for an imbalanced dataset?
what's the difference between weighted and macro?

Solution

The question is about the meaning of the average parameter in sklearn.metrics.f1_score.

As you can see from the code:

average=micro says the function to compute f1 by considering total true positives, false negatives and false positives (no matter of the prediction for each label in the dataset)
average=macro says the function to compute f1 for each label, and returns the average without considering the proportion for each label in the dataset.
average=weighted says the function to compute f1 for each label, and returns the average considering the proportion for each label in the dataset.
average=samples says the function to compute f1 for each instance, and returns the average. Use it for multilabel classification.