Currently, I try to implement an anomaly detection algorithm with scikit-learn in python. I relabeled the dataset to Inliers (Normal instances) are labelled 1
, while outliers (Anomaly instances) are labelled -1
(Reference)
For the calculation of accuracy_score, precision_score, recall_score and f1_score I get different values when I set pos_label=1
or pos_label=-1
.
So what is the label of the positive class in the context of anomaly detection: 1
or -1
?
You are interested in finding what samples are outliers. Then, positive class is the outliers.
Note: Generally, you should try to improve the recall rather than the precision score, because you need to reduce false negatives (predict that an outlier is an inlier)