Understanding Precision Recall Curve and Precision/Recall metrics

I'd like to understand why the precision-recall curve for the minority class ("1") is so good while the metrics about precision (0,2) and recall (0,4) for the same class is so bad. I used the sklearn.metrics.plot_precision_recall_curve with pos_label=0 (majority class) and pos_label=1 (minority class). Below you can see the code used.

def plotagem_curvas (nome_modelo, modelo, X_test, y_test, folds, pos_label):
  roc_auc = 0
  ap=0
  if (pos_label == 0):
     classe='Not Stroke'
  else:
     classe='Stroke'
  fig, axs = plt.subplots(1, 2, figsize=(12,4))
  axs[0].set_title("Curva ROC - " + nome_modelo + " \"" + classe + "\"" , fontsize=10)
  disp = metrics.plot_roc_curve(modelo, X_test, y_test, ax=axs[0], pos_label=pos_label)
  roc_auc = disp.roc_auc
  axs[1].set_title("Curva Precision Recall - " + nome_modelo + " \"" + classe + "\"", fontsize=10)
  disp = metrics.plot_precision_recall_curve(modelo, X_test, y_test, ax=axs[1], pos_label=pos_label)
  ap = disp.average_precision
  return (roc_auc, ap)

# Random Forest - Precision_recall curve for both classes (0, 1)
roc_auc, ap = plotagem_curvas ("Random Forest", modelo, X_test, y_test, folds, 0)
roc_auc2, ap2 = plotagem_curvas ("Random Forest", modelo, X_test, y_test, folds, 1)

Here is the confusion Matrix: Matrix

And the curves ...: Curves

I don't know if I made a mistake when calling the function "plot_precision_recall_curve".

Solution

Remember, your precision/recall-curves is plotted for different thresholds of you classifier (I assume you are using a Random Forest).

Precision/recall curve

The curve is calculated by saying; "If I classify inputs with a model-output of 0.1 (or greater) as 'stroke', what is my precision/recall? What if I classify inputs with a model-output of 0.2 (or greater) as 'stroke', instead of 0.1, what is the precision/recall then? What if I change the score to 0.3, 0.4 ...,1" that is the curve-plot you get.

Confusion matrix

Your confusion-matrix is based on a single threshold i.e, you might say "I classify all objects as 'stroke' if the score/output from my model is 0.5 or greater" (which is often the standard-threshold the binary case if you don't change it). Then you classify your test set based on that single threshold, and create your confusion matrix.

Thus your precision = 0.2, recall=0.4 (I'll guess) is based on the threshold 0.5, where your curve is based on different thresholds to tell you "is there a threshold which creates a good trade-off between your precision and recall?".

Get the optimal threshold

You can get the precision/recall values for each threshold by using scikit-learns precision_recall_curve, chose your threshold for your optimal precision/recall value and then create your confusion matrix.

I would assume that if you use that to check your model, you'll find those 0.4 and 0.2 values when the threshold is around 0.5