python scikit-learn cross-validation confusion-matrix

Confusion matrix from probabilities

I have the following scikit-learn machine learning pipeline:

cv = StratifiedKFold(n_splits=6)
classifier = svm.SVC(kernel='linear', probability=True,
                     random_state=random_state)

tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)

i = 0
for train, test in cv.split(X, y):
    probas_ = classifier.fit(X[train], y[train]).predict_proba(X[test])
    # Compute ROC curve and area the curve
    fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])
    tprs.append(interp(mean_fpr, fpr, tpr))
    tprs[-1][0] = 0.0
    roc_auc = auc(fpr, tpr)
    aucs.append(roc_auc)
    i += 1

Now I would like to also calculate (and plot) the confusion matrix. How can this be done with the above code? I'm only getting probabilities (which I need for caluclating AUC). I have 4 classes (1...4).

Solution

You can use this example here to plot confusion matrix:

http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py

But for this, you need to have discrete class values (not probabilities). Which can be easily derived from your probas_ variable using:

y_pred = np.argmax(probas_, axis=1)

Now you can use this y_pred in confusion matrix