python deep-learning nlp conv-neural-network confusion-matrix

CNN - Confusion Matrix wrong display

I have trained a model for handwritten digits multiclass classification using CNN in Keras. I am trying to evaluate the model with the same training images to get an estimate of the accuracy of the algorithm; however, when I evaluate the CNN confusion matrix, it gives a one column only of the form:

[[4132    0    0    0    0    0    0    0    0    0]
 [4684    0    0    0    0    0    0    0    0    0]
 [4177    0    0    0    0    0    0    0    0    0]
 [4351    0    0    0    0    0    0    0    0    0]
 [4072    0    0    0    0    0    0    0    0    0]
 [3795    0    0    0    0    0    0    0    0    0]
 [4137    0    0    0    0    0    0    0    0    0]
 [4401    0    0    0    0    0    0    0    0    0]
 [4063    0    0    0    0    0    0    0    0    0]
 [4188    0    0    0    0    0    0    0    0    0]]

I guess the algorithm is giving the correct result since those are the total numbers of each digit in the database; however, the confusion matrix should give something like this:

[[4132    0    0    0    0    0    0    0    0    0]
 [   0 4684    0    0    0    0    0    0    0    0]
 [   0    0 4177    0    0    0    0    0    0    0]
 [   0    0    0 4351    0    0    0    0    0    0]
 [   0    0    0    0 4072    0    0    0    0    0]
 [   0    0    0    0    0 3795    0    0    0    0]
 [   0    0    0    0    0    0 4137    0    0    0]
 [   0    0    0    0    0    0    0 4401    0    0]
 [   0    0    0    0    0    0    0    0 4063    0]
 [   0    0    0    0    0    0    0    0    0 4188]]

The code is in this link

The data can be taken from the "train.csv" file in this Kaggle project.

I would like to ask you guys what am I doing wrong in the code, such that I obtain this weird result.

Solution

I checked your code and I have a solution to your problem. The calculation of the Confusion Matrix works as well as possible. The problem is that your network is not learning at all and it classifies all data to 0. You can verify this by setting the verbose argument to 1 in the fit function and then you can observe an accuracy of about 10%, which is equivalent to random guessing.

model.fit(X_train, Y_train, epochs=100, batch_size=32, validation_data=(X_train, Y_train), verbose=1)

It's because you don't normalize your data. All you have to do is to divide your dataset by 255 so that the number values are in the range [0; 1] and then everything is working properly and your network is learning.

X_train = X.reshape((-1, 28, 28, 1))
X_train = X_train / 255.0
Y_train = to_categorical(Y)

The same thing you should do with your test set.