Search code examples
pythonpandasscikit-learnconfusion-matrix

Confusion matrix over multiple thresholds


I'm trying to (efficiently) run sklearn.metrics.confusion_matrix for multiple thresholds. It needs to be done so that I can tell the customer what kind of performance one should expect at any given %challenge of the population.

Currently, I'm doing it in a loop, over all possible thresholds, but this is slow and inefficient. Any way to do it in a one-liner, or something similar?

threshold_list = (np.linspace(1, 0, 1001)).tolist()
for threshold in threshold_list:
    df.loc[df['score'] >= threshold,'prediction'] = '1'
    arr = confusion_matrix(df['true'].astype('int16').values, df['prediction'].astype('int16').values)
    ....
    ....

Solution

  • If TPr and FPr is enough for you. You can do the following:

    y_true=[1,0,0,1,1,0,0]
    y_pred=[0.67, 0.48, 0.27, 0.52, 0.63, 0.45, 0.53]
    fpr, tpr, thresholds = roc_curve(y_true, y_pred)
    res = pd.DataFrame({'FPR': fpr, 'TPR': tpr, 'Threshold': thresholds})
    res[['TPR', 'FPR', 'Threshold']]
    

    Output:

        TPR         FPR Threshold
    0   0.333333    0.00    0.67
    1   0.666667    0.00    0.63
    2   0.666667    0.25    0.53
    3   1.000000    0.25    0.52
    4   1.000000    1.00    0.27