I'd like to understand why the precision-recall curve for the minority class ("1") is so good while the metrics about precision (0,2) and recall (0,4) for the same class is so bad. I used the sklearn.metrics.plot_precision_recall_curve with pos_label=0 (majority class) and pos_label=1 (minority class). Below you can see the code used.
def plotagem_curvas (nome_modelo, modelo, X_test, y_test, folds, pos_label):
roc_auc = 0
ap=0
if (pos_label == 0):
classe='Not Stroke'
else:
classe='Stroke'
fig, axs = plt.subplots(1, 2, figsize=(12,4))
axs[0].set_title("Curva ROC - " + nome_modelo + " \"" + classe + "\"" , fontsize=10)
disp = metrics.plot_roc_curve(modelo, X_test, y_test, ax=axs[0], pos_label=pos_label)
roc_auc = disp.roc_auc
axs[1].set_title("Curva Precision Recall - " + nome_modelo + " \"" + classe + "\"", fontsize=10)
disp = metrics.plot_precision_recall_curve(modelo, X_test, y_test, ax=axs[1], pos_label=pos_label)
ap = disp.average_precision
return (roc_auc, ap)
# Random Forest - Precision_recall curve for both classes (0, 1)
roc_auc, ap = plotagem_curvas ("Random Forest", modelo, X_test, y_test, folds, 0)
roc_auc2, ap2 = plotagem_curvas ("Random Forest", modelo, X_test, y_test, folds, 1)
Here is the confusion Matrix: Matrix
And the curves ...: Curves
I don't know if I made a mistake when calling the function "plot_precision_recall_curve".
Remember, your precision/recall-curves is plotted for different thresholds of you classifier (I assume you are using a Random Forest).
Precision/recall curve
The curve is calculated by saying; "If I classify inputs with a model-output of 0.1 (or greater) as 'stroke', what is my precision/recall? What if I classify inputs with a model-output of 0.2 (or greater) as 'stroke', instead of 0.1, what is the precision/recall then? What if I change the score to 0.3, 0.4 ...,1" that is the curve-plot you get.
Confusion matrix
Your confusion-matrix is based on a single threshold i.e, you might say "I classify all objects as 'stroke' if the score/output from my model is 0.5 or greater" (which is often the standard-threshold the binary case if you don't change it). Then you classify your test set based on that single threshold, and create your confusion matrix.
Thus your precision = 0.2
, recall=0.4
(I'll guess) is based on the threshold 0.5, where your curve is based on different thresholds to tell you "is there a threshold which creates a good trade-off between your precision and recall?".
Get the optimal threshold
You can get the precision/recall values for each threshold by using scikit-learns precision_recall_curve, chose your threshold for your optimal precision/recall value and then create your confusion matrix.
I would assume that if you use that to check your model, you'll find those 0.4 and 0.2 values when the threshold is around 0.5