Search code examples
pythonmachine-learningrocaucprecision-recall

How to calculate TPR and FPR in Python without using sklearn?


Initialize the list of lists:

data = [[1.0, 0.635165,0.0], [1.0, 0.766586,1.0], [1.0, 0.724564,1.0],
        [1.0, 0.766586,1.0],[1.0, 0.889199,1.0],[1.0, 0.966586,1.0],
        [1.0, 0.535165,0.0],[1.0, 0.55165,0.0],[1.0, 0.525165,0.0],
        [1.0, 0.5595165,0.0] ]

Create the Pandas DataFrame:

df = pd.DataFrame(data, columns = ['y', 'prob','y_predict']) 

Print data frame.

print(df)

For this data-set, I want to find:

  1. Confusion matrix without using Sklearn
  2. Numpy array of TPR and FPR without using Sklearn, for plotting ROC.

How to do this in python?


Solution

  • You can calculate the false positive rate and true positive rate associated to different threshold levels as follows:

    import numpy as np
    
    def roc_curve(y_true, y_prob, thresholds):
    
        fpr = []
        tpr = []
    
        for threshold in thresholds:
    
            y_pred = np.where(y_prob >= threshold, 1, 0)
    
            fp = np.sum((y_pred == 1) & (y_true == 0))
            tp = np.sum((y_pred == 1) & (y_true == 1))
    
            fn = np.sum((y_pred == 0) & (y_true == 1))
            tn = np.sum((y_pred == 0) & (y_true == 0))
    
            fpr.append(fp / (fp + tn))
            tpr.append(tp / (tp + fn))
    
        return [fpr, tpr]