Search code examples
pythonscikit-learnrocoutliersauc

ROC curve for Isolation Forest


I am trying to plot the ROC curve to evaluate the accuracy of Isolation Forest for a Breast Cancer dataset. I calculated the True Positive rate (TPR) and False Positive Rate (FPR) from the confusion matrix. However, I do not understand how the TPR and FPR are in the form of matrices, instead of single integer values. And the ROC curve seems to work only with FPR and TPR in the form of matrices (I also tried to manually write the code for calculating FPR and TPR).

Are the TPR and FPR values always in the form of matrices?

Either way, my ROC curve comes out as a straight line. Why is it so?

Confusion Matrix :

from sklearn.metrics import confusion_matrix
cnf_matrix = confusion_matrix(y, y_pred_test1)

O/P :

>     [[  5  25]
>      [ 21 180]]

True Positive and False Positive : (Also, why are these values directly taken from the confusion matrix?)

F_P = cnf_matrix.sum(axis=0) - np.diag(cnf_matrix)  
F_N = cnf_matrix.sum(axis=1) - np.diag(cnf_matrix)
T_P = np.diag(cnf_matrix)
T_N = cnf_matrix.sum() - (FP + FN + TP)

F_P = F_P.astype(float)
F_N = F_N.astype(float)
T_P = T_P.astype(float)
T_N = T_N.astype(float)

O/P :

False Positive [21. 25.] 
False Negative [25. 21.] 
True Positive [  5. 180.] 
True Negative [180.   5.]

TPR and FPR :

tp_rate = TP/(TP+FN)
fp_rate = FP/(FP+TN)

O/P :

TPR :  [0.16666667 0.89552239]
FPR [0.10447761 0.83333333]

ROC curve :

from sklearn import metrics
import matplotlib.pyplot as plt

plt.plot(fp_rate,tp_rate)
plt.show()

O/P :

enter image description here


Solution

  • The confusion matrix essentially gives you a single point on the ROC curve. To construct a 'full' ROC curve you will need a list of probabilities and then the ROC curve can be plotted by varying the 'threshold' used in determining the class prediction to determine which class each instance belongs to.

    In your simple case (when you have only one point of the ROC curve) you could plot the ROC curve by extrapolating to the origin and the point (1,1):

    # compare to your confusion matrix to see values.
    TP = 180
    FN = 21
    
    tpr = TP/(TP+FN)
    fpr = 1-tpr
    
    tpr_line = [0, tpr, 1]
    fpr_line = [0, fpr 1]
    
    plt.plot(fpr, tpr, 'k-', lw=2)
    plt.xlabel('FPR')
    plt.ylabel('TPR')
    plt.xlim(0, 1)
    plt.ylim(0, 1)
    

    and the ROC curve looks like:

    example_single_point_roc_curve