ROC-AUC FPR FNR in Python and R?

I have a dataframe object in R/Python that looks like:

df columns:
fraud = [1,1,0,0,0,0,0,0,0,1]
score = [0.84, 1, 1.1, 0.4, 0.6, 0.13, 0.32, 1.4, 0.9, 0.45]

When I use roc_curve in Python I get fpr, fnr and thresholds.

I have 2 questions, maybe a bit theoretical but please explain it to me:

Are these thresholds are calculated actually? I have calculated manually fpr and fnr, but are these thresholds = the score above?
How can I generate same fpr, fnr and thresholds in R?

Solution

thresholds usually correspond to the value which maximizes tpr + tnr (sensitivity + specificity) this is called the Youden J index (tpr + tnr - 1) but has also several other names.

take the following example with Sonar dataset:

library(mlbench)
library(xgboost)
library(caret)
library(pROC)
data(Sonar)

lets fit a model on part of Sonar data and predict on another part:

ind <- createDataPartition(Sonar$Class, p = 0.7, list = FALSE)
train <- Sonar[ind,]
test <- Sonar[-ind,]
X = as.matrix(train[, -61])
dtrain = xgb.DMatrix(data = X, label = as.numeric(train$Class)-1)
dtest <- xgb.DMatrix(data = as.matrix(test[, -61]))

fit the model on the train data:

model = xgb.train(data = dtrain, 
                  eval = "auc",
                  verbose = 0,  maximize = TRUE, 
                  params = list(objective = "binary:logistic",
                                eta = 0.1,
                                max_depth = 6,
                                subsample = 0.8,
                                lambda = 0.1 ), 
                  nrounds = 10)

preds <- predict(model, dtest)
true <- as.numeric(test$Class)-1


plot(roc(response = true,
         predictor =  preds,
         levels=c(0, 1)),
     lwd=1.5, print.thres = T, print.auc = T, print.auc.y = 0.5)

So if you set the threshold at 0.578 you will maximize the value tpr + tnr and the values in the parenthesis on the plot are tpr and tnr. Verify:

sensitivity(as.factor(ifelse(preds > 0.578, "1", "0")), as.factor(true))
#output
[1] 0.9090909
specificity(as.factor(ifelse(preds > 0.578, "1", "0")), as.factor(true))\
#output
[1] 0.7586207

you could create prediction over many possible thresholds:

do.call(rbind, lapply((1:1000)/1000, function(x){
  sens <- sensitivity(as.factor(ifelse(preds > x, "1", "0")), as.factor(true))
  spec <- specificity(as.factor(ifelse(preds > x, "1", "0")), as.factor(true))
  data.frame(sens, spec)
})) -> thresh

and now:

thresh[which.max(rowSums(thresh)),]
#output
         sens      spec
560 0.9090909 0.7586207

You can also check this out:

thresh[555:600,]

That being said, usually when considering financial data, not just the class is if off interested but also the cost associated with false predictions which is usually not the same for false negatives and false positives. So these models are fit using cost-sensitive classification. More on the mater. On another note, when deciding on the threshold, you should do it either on cross validated data or on a validation set specifically designated for the task. If you use it one the test set that inevitably leads to over-optimistic predictions.