Predicting the probability of class assignment for each chosen sample from the Train_features
:
probs = classifier.predict_proba(Train_features)`
Choosing the class for which the AUC has to be determined.
preds = probs[:,1]
Calculating false positive rate, true positive rate and the possible thresholds that can clearly separate TP and TN.
fpr, tpr, threshold = metrics.roc_curve(Train_labels, preds)
roc_auc = metrics.auc(fpr, tpr)
print(max(threshold))
Output : 1.97834
The previous answer did not really address your question of why the threshold is > 1, and in fact is misleading when it says the threshold does not have any interpretation.
The range of threshold should technically be [0,1] because it is the probability threshold. But scikit learn adds +1 to the last number in the threshold array to cover the full range [0, 1]. So if in your example the max(threshold) = 1.97834, the very next number in the threshold array should be 0.97834.
See this sklearn github issue thread for an explanation. It's a little funny because somebody thought this is a bug, but it's just how the creators of sklearn decided to define threshold.
Finally, because it is a probability threshold, it does have a very useful interpretation. The optimal cutoff is the threshold at which sensitivity + specificity are maximum. In sklearn learn this can be computed like so
fpr_p, tpr_p, thresh = roc_curve(true_labels, pred)
# maximize sensitivity + specificity, i.e. tpr + (1-fpr) or just tpr-fpr
th_optimal = thresh[np.argmax(tpr_p - fpr_p)]