Search code examples
pythonpandasconfusion-matrixfalse-positive

Calculate a set of performance metrics from signals in a pandas column


I have a dataframe containing three of my signals as follows:

enter image description here

  • rr_manually_cleaned is the ground truth signal

  • rr_noisy is the raw noisy signal

  • rr_filtered is the output from an anomaly detector that has cleaned
    rr_noisy from detected anomalies

To evaluate the performance of the anomaly detector - I want to find out FP, FN, etc.

  • For FN (false negative) - this would be the case where there is a data point in rr_filtered but not in rr_manually_cleaned (i.e. a NaN value) as this means the anomaly detector has failed to detect an anomaly

  • For FP (false positive) - this would be the case where a datapoint exists in rr_manually_cleaned but not in rr_filtered meaning the anomaly detector detected an anomaly that should not be one

Using this setup - what is the best way of going about calculating FP, FN, and other relevant performance metrics (F1, Precision, Recall, etc.)? Is it possible to build a confusion matrix straight from this?


Solution

  • Try to use isna() test for that.

    df.loc[(~df['rr_filtered'].isna()) & (df['rr_manually_cleaned'].isna()), 'TEST'] = 'FN'
    
    df.loc[(~df['rr_manually_cleaned'].isna()) & (df['rr_filtered'].isna()), 'TEST'] = 'FP'