Search code examples

How to get a single value from softmax instead of probability & get confusion matrix

test_generator = test_datagen.flow_from_directory(
    target_size=(150, 150),
test_loss, test_acc = model.evaluate_generator(test_generator, steps=28)
print('test acc:', test_acc)

predict = model.predict_generator(test_generator,steps =28, verbose=0)
print('Prediction: ', predict)

test_imgs, test_labels = next(test_generator)


cm =confusion_matrix(test_labels, predict)

I got 2 problems from the above code. Firstly, I get an error of having different number of samples between my test_labels and predict. My test_labels only store 20 samples (as written in the batch size. Meanwhile, my predict from model.predict_generator have total of 560images (20*28 steps)

ValueError: Found input variables with inconsistent numbers of samples: [20, 560]

The second problem is, how do I change my softmax value (from probabilities of my 4 image classes in float to int)? I get an error when I change steps to 1(to test only 20 samples instead of total 560 in above problem)

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

which I think is the error because when I make prediction I get some 4-d list (from 4 classes) like this e.g.

Prediction:  [[2.9905824e-12 5.5904431e-10 1.8195983e-11 1.0000000e+00]
 [2.7073351e-21 1.0000000e+00 8.3221777e-21 4.9091786e-22]
 [4.2152173e-05 6.1331893e-04 3.7486094e-05 9.9930704e-01]

Is there anyway that i can get which is the exact class my model predict (such as in my test loss, and test accuracy).

Or is there any other simple way to get the confusion matrix in Keras that I didn't know of? :(

Edit1 (obtained from desertnaut) The returned of test_labels variables is as below

array([[0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [1., 0., 0., 0.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 0., 1., 0.],
   [0., 1., 0., 0.],
   [0., 1., 0., 0.],
   [0., 0., 0., 1.],
   [0., 0., 0., 1.],
   [0., 0., 1., 0.]], dtype=float32), array([[1., 0., 0., 0.],
   [0., 0., 0., 1.],

^ This is for only 1 cycle (theres total of 28, another 27 more of this lists). This snap is somewhere in the middle of the output. The list too long to show the most top array (cant scroll to top of Spyder's output box). I tried using argmax to try as the second problem above. e.g.

test_class = np.argmax(test_labels, axis=1)
test_class = test_class.tolist()

But I didnt get the correct answer. I think because of the loop different. I think the output from predict_class as given by you is 1 list which contains all the 560 samples prediction. But for test_label it count as 28 different loop. The output of predict_class is like this. e.g.

[3, 1, 1, 2, 0, 0, 3, 1, 2, 0, 0, 1, 2, 2, 1, 3, 2, 2, 0, 2, 0, 3, 0, 1, 3, 3, 1, 2, 0, 1, 1, 0, 2, 1, 0, 2, 1, 3, 1, 0, 1, 2, 2, 2, 1, 2, 2, 2, 2, 3, 2, 3, 1, 3, 1, 1, 3, 2, 2, 0, 1, 1, 0, 2, 1, 3, 3, 2, 0, 1, 1, 0, 3, 0, 0, 2, 3, 2, 1, 1, 2, 3, 0, 0, 2, 1, 3, 2, 3, 1, 0, 0, 3, 0, 3, 1, 1, 3, 1, 0, 1, 2, 0, 0, 0, 0, 3, 2, 2, 3, 3, 1, 3, 0, 3, 2, 0, 0, 0, 2, 1, 0, 2, 2, 1, 0, 1, 2, 2, 2, 3, 2, 1, 2, 2, 0, 0, 2, 3, 3, 1, 2, 2, 3, 0, 2, 1, 1, 3, 0, 1, 0, 1, 3, 3, 1, 3, 0, 1, 3, 0, 2, 1, 1, 3, 0, 1, 0, 1, 1, 3, 2, 3, 3, 0, 1, 1, 3, 2, 0, 3, 2, 0, 1, 3, 3, 2, 1, 1, 1, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 1, 2, 2, 0, 0, 1, 1, 1, 0, 2, 2, 0, 3, 0, 3, 2, 2, 0, 1, 1, 1, 3, 0, 2, 2, 1, 3, 3, 3, 1, 2, 0, 3, 0, 0, 3, 1, 1, 3, 0, 2, 2, 2, 2, 3, 0, 2, 3, 0, 3, 2, 3, 2, 3, 3, 0, 0, 2, 3, 2, 0, 0, 3, 1, 3, 0, 0, 1, 1, 0, 1, 0, 0, 3, 0, 0, 1, 1, 3, 1, 3, 2, 1, 0, 1, 0, 2, 3, 0, 1, 2, 1, 2, 2, 2, 2, 0, 2, 2, 1, 3, 2, 2, 2, 1, 3, 3, 2, 0, 3, 0, 1, 2, 2, 2, 3, 1, 0, 2, 3, 2, 1, 0, 1, 2, 0, 2, 1, 2, 2, 2, 1, 0, 0, 0, 0, 0, 3, 3, 2, 1, 0, 0, 3, 0, 0, 2, 1, 0, 2, 3, 2, 3, 2, 1, 3, 0, 2, 1, 0, 0, 0, 1, 2, 2, 3, 2, 3, 2, 0, 3, 2, 1, 0, 0, 3, 2, 3, 0, 2, 0, 1, 0, 0, 3, 2, 3, 1, 3, 2, 2, 2, 0, 1, 2, 0, 2, 0, 0, 0, 3, 1, 3, 2, 3, 2, 1, 2, 3, 3, 1, 3, 3, 0, 1, 1, 2, 0, 1, 2, 3, 0, 2, 2, 2, 0, 0, 3, 0, 3, 3, 3, 3, 3, 3, 0, 1, 3, 0, 2, 3, 1, 0, 2, 3, 2, 3, 1, 1, 2, 1, 2, 3, 0, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 0, 0, 2, 0, 1, 0, 3, 1, 0, 0, 2, 1, 2, 3, 3, 2, 2, 1, 2, 2, 0, 2, 0, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 0, 3, 2, 2, 3, 0, 1, 3, 2, 3, 3, 0, 3, 1, 2, 3, 3, 0, 3, 3, 3, 2, 2, 0, 3, 3, 3, 0, 1, 1, 1, 0, 0, 0, 0, 1, 2, 2, 2, 3, 0, 0, 1, 1, 0, 2, 0, 2, 0, 3, 3, 1, 0, 2, 2, 1, 0, 0, 3, 0, 3, 3, 3]

^ 1 list of 560 samples.

The output of test_class (with argmax edit). e.g.

[[7, 3, 0, 2], [9, 3, 2, 0], [0, 2, 9, 6], [0, 2, 3, 1], [2, 3, 0, 1], [6, 0, 1, 4], [5, 0, 1, 2], [1, 3, 2, 0], [0, 2, 3, 5], [0, 1, 3, 7], [1, 0, 8, 4], [3, 7, 1, 0], [3, 5, 0, 2], [9, 0, 3, 1], [0, 2, 1, 9], [8, 5, 1, 0], [2, 0, 1, 8], [0, 5, 1, 3], [0, 17, 1, 4], [2, 1, 7, 0], [0, 4, 5, 1], [1, 2, 0, 4], [0, 2, 3, 1], [2, 0, 1, 3], [3, 2, 1, 0], [0, 2, 7, 6], [5, 0, 18, 2], [2, 0, 7, 1]]

Is there a function in numpy or scipy to make it 1 list of 560 samples instead of 28 lists*20batches.


Thanks! Both are now in 1 list. However, is there anyway to check if the samples are shuffled the same way? I obtained 87.8% classification accuracy. but the conf_matrix I get is very very low.

[[33 26 35 46]
 [43 25 41 31]
 [38 36 36 30]
 [32 30 39 39]]


  • For your second problem, since your predictions come one-hot encoded, you should simply get the maximum argument; using your shown 3 predictions as an example:

    import numpy as np
    # your shown predictions:
    predict = np.array( [[2.9905824e-12, 5.5904431e-10, 1.8195983e-11 ,1.0000000e+00],
                         [2.7073351e-21, 1.0000000e+00, 8.3221777e-21, 4.9091786e-22],
                         [4.2152173e-05, 6.1331893e-04, 3.7486094e-05, 9.9930704e-01]])
    predict_class = np.argmax(predict, axis=1)
    predict_class = predict_class.tolist()
    # [3, 1, 3]

    Regarding your first problem: I assume you cannot independently get your test_labels for the whole of your dataset (otherwise presumably you would use this array of length 560 for your confusion matrix); if so, you could use something like [updated after OP edit]:

    test_labels = []
    for i in range(28):
        test_imgs, batch_labels = next(test_generator)
        batch_labels = np.argmax(batch_labels, axis=1).tolist()
        test_labels = test_labels + batch_labels

    after which both your test_labels and predict_class will be lists of length 560, and you should be able to get the confusion matrix for the whole of your test set as

    cm =confusion_matrix(test_labels, predict_class)

    To ensure that the predictions and test labels are indeed aligned, you should add the shuffle=False argument to your test_datagen.flow_from_directory() (default value is True - docs).

    Given the confusion matrix, if you need further classification measures like precision, recall etc, have a look at my answer here.