Search code examples
pythonpandascrosstabconfusion-matrix

How to create a confusion matrix with pandas.crosstab when all the predicted values are 1?


I am learning about performance metrics. I have a dataframe with 0-10099 rows and with two columns (Y_Actual, Y_Predicted). I would like to create a confusion matrix with pandas.

My first attempt:

y_actual= df5a["y"]
y_actual= y_actual.rename("Actual")
y_predicted=df5a["labels"]
y_predicted= y_predicted.rename("Predicted")
confusion_matrix_5a= pd.crosstab(y_actual, y_predicted)
confusion_matrix_5a

output1:

Predicted   1
Actual  
0.0        100
1.0        10000

After checking all my Y_Predicted, I realized that all the values were "1". To get pandas.crosstab() to create the matrix in this situation, I added an extra row to my dataframe (Y_actual=0, Y_predicted= 1).

output2:

Predicted   0   1
Actual      
0.0         1   100
1.0         0   10000

The real confusion matrix should be:

Predicted   0   1
Actual      
0.0         0   100
1.0         0   10000

The "1" in output2 is there because I added the extra row. I know this will not affect my accuracy because I have many rows, so the effect of adding the row will be negligible. Do you know any other way to create the matrix with pandas.crosstab() when you have a unique value in one of the columns? Any suggestions about how to do it without adding the extra row?


Solution

  • crosstab picks up values present in the columns, so you need to populate the missing column manually. A simple way to do that is reindex.

    Let's say conf_mat is your confusion matrix with only one column.

    Then you can do conf_mat.reindex([0,1], axis = 'columns', fill_value = 0) to force the dataframe to hold columns with names 0 and 1.

    Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html