Search code examples
python-3.xscikit-learnlogistic-regression

Sklearn logistic regression - adjust cutoff point


I have a logistic regression model trying to predict one of two classes: A or B.

  • My model's accuracy when predicting A is ~85%.
  • Model's accuracy when predicting B is ~50%.
  • Prediction of B is not important however prediction of A is very important.

My goal is to maximize the accuracy when predicting A. Is there any way to adjust the default decision threshold when determining the class?

classifier = LogisticRegression(penalty = 'l2',solver = 'saga', multi_class = 'ovr')
classifier.fit(np.float64(X_train), np.float64(y_train))

Thanks! RB


Solution

  • As mentioned in the comments, procedure of selecting threshold is done after training. You can find threshold that maximizes utility function of your choice, for example:

    from sklearn import metrics
    preds = classifier.predict_proba(test_data)
    tpr, tpr, thresholds = metrics.roc_curve(test_y,preds[:,1])
    print (thresholds)
    
    accuracy_ls = []
    for thres in thresholds:
        y_pred = np.where(preds[:,1]>thres,1,0)
        # Apply desired utility function to y_preds, for example accuracy.
        accuracy_ls.append(metrics.accuracy_score(test_y, y_pred, normalize=True))
    

    After that, choose threshold that maximizes chosen utility function. In your case choose threshold that maximizes 1 in y_pred.