Search code examples
pythonscikit-learnrocauc

Area under the ROC curve using Sklearn?


I can't figure out why the Sklearn function roc_auc_score returns 1 in the following case:

y_true = [0, 0, 1, 0, 0, 0, 0, 1]

y_scores = [0.18101096153259277, 0.15506085753440857, 
            0.9940806031227112, 0.05024950951337814, 
            0.7381414771080017, 0.8922111988067627, 
            0.8253260850906372, 0.9967281818389893]

roc_auc_score(y_true,y_scores)

The three scores 0.7381414771080017, 0.8922111988067627, 0.8253260850906372 at the end don't match the labels 0, 0, 0. So, how can the AUC be 1? What am I getting wrong here?


Solution

  • The auc of ROC curve just measures the ability of your model to rank order the datapoints, with respect to your positive class.

    In your example, the score of the positive class is always greater than the negative class datapoints. Hence, the auc_roc_score of 1 is correct.

    pd.DataFrame({'y_true':y_true,'y_scores':y_scores}).sort_values('y_scores',ascending=False)
    
        y_scores    y_true
    7   0.996728    1
    2   0.994081    1
    5   0.892211    0
    6   0.825326    0
    4   0.738141    0
    0   0.181011    0
    1   0.155061    0
    3   0.050250    0