I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the default threshold when creating predictions is 0.5. How can I change this default setting to find out what the accuracy is in my model when doing a 10-fold cross-validation? Basically, I want my model to predict a '1' for anyone greater than 0.25, not 0.5. I've been looking through all the documentation, and I can't seem to get anywhere.
That is not a built-in feature. You can "add" it by wrapping the LogisticRegression class in your own class, and adding a threshold
attribute which you use inside a custom predict()
method.
However, some cautions:
LogisticRegression.decision_function()
returns a signed distance to the selected separation hyperplane. If you are looking at predict_proba()
, then you are looking at logit()
of the hyperplane distance with a threshold of 0.5. But that's more expensive to compute.class_weight
if you have an unbalanced problem rather than manually setting the threshold. This should force the classifier to choose a hyperplane farther away from the class of serious interest.