Search code examples
pythonscikit-learnthresholdroc

Python sklearn ROC-AUC curve with only one feature and various thresholds


I'm relatively new in this field and a bit confused right now... I'll explain: I've some elements in my data, each with a value between 0 and 1 and an associated label (1, 0). I need to test some thresholds, for example with a threshold = 0.4, all the values > 0.4 will be predicted as true (1) and all the values < 0.4 will be predicted as false (0). I think I don't need a machine learning classifiers because, based on the threshold that I choose, I already know which label assign to each element.

This is what I've done until now:

prediction = []
for row in range(dfAggr.shape[0]):
    if dfAggr['value'].values[row] >= threshold:
        prediction.append(1)
    else
        prediction.append(0)

label = dfAggr['truth'].values.astype(int)

#ROC CURVE
fpr, tpr, thresholds = roc_curve(label, prediction)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, lw=1, label='ROC (area = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], '--', color=(0.6, 0.6, 0.6), label='Luck')
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.grid()
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.savefig("rocauc.pdf", format="pdf")
plt.show()

And I obtain this plot: enter image description here

I think this plot is quite wrong, since I want a ROC curve build by testing each possible threshold between 0 and 1 to get the best possible value of cutoff.

Is it conceptually wrong what I've done?


Solution

  • I assume you are using from sklearn.metrics import roc_curve. The roc_curve function will go through all the thresholds for you, there is no need to pre-select one yourself.

    You should do something like this:

    predictions =  dfAggr['value'].values
    label = dfAggr['truth'].values.astype(int)
    fpr, tpr, thresholds = roc_curve(label, predictions)
    [...]