python pandas confusion-matrix false-positive

Calculate a set of performance metrics from signals in a pandas column

I have a dataframe containing three of my signals as follows:

rr_manually_cleaned is the ground truth signal
rr_noisy is the raw noisy signal
rr_filtered is the output from an anomaly detector that has cleaned
rr_noisy from detected anomalies

To evaluate the performance of the anomaly detector - I want to find out FP, FN, etc.

For FN (false negative) - this would be the case where there is a data point in rr_filtered but not in rr_manually_cleaned (i.e. a NaN value) as this means the anomaly detector has failed to detect an anomaly
For FP (false positive) - this would be the case where a datapoint exists in rr_manually_cleaned but not in rr_filtered meaning the anomaly detector detected an anomaly that should not be one

Using this setup - what is the best way of going about calculating FP, FN, and other relevant performance metrics (F1, Precision, Recall, etc.)? Is it possible to build a confusion matrix straight from this?

Solution

Try to use isna() test for that.

df.loc[(~df['rr_filtered'].isna()) & (df['rr_manually_cleaned'].isna()), 'TEST'] = 'FN'

df.loc[(~df['rr_manually_cleaned'].isna()) & (df['rr_filtered'].isna()), 'TEST'] = 'FP'