Search code examples

Why does keras neural network predicts the same number for all different images?

I'm trying to use keras neural network of tensorflow to recognize the handwriting digit number. But idk why when i call predict(), it returns same results for all of input images.

Here is code:

  ### Train dataset ###
  mnist = tf.keras.datasets.mnist
  (x_train, y_train), (x_test, y_test) = mnist.load_data()
  x_train = x_train/255
  x_test = x_test/255

  model = tf.keras.models.Sequential()

  model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]), y_train, epochs=5)

The result looks like this:

Epoch 1/5
1875/1875 [==============================] - 2s 672us/step - loss: 0.2620 - accuracy: 0.9248
Epoch 2/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.1148 - accuracy: 0.9658
Epoch 3/5
1875/1875 [==============================] - 1s 559us/step - loss: 0.0784 - accuracy: 0.9764
Epoch 4/5
1875/1875 [==============================] - 1s 564us/step - loss: 0.0596 - accuracy: 0.9817
Epoch 5/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.0462 - accuracy: 0.9859

Then the code to use image to test is below:

  img = cv.imread('path/to/1.png')
  img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
  img = cv.resize(img,(28,28))
  img = np.array([img])
  if cv.countNonZero((255-image)) == 0:
  img = np.invert(img)
  prediction = model.predict(img)
  result = np.argmax(prediction)
  print(f'Result: {result}')

The result is:

Input with number 1

plt show: PlT show 1

[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3

Input with number 2

plt show PlT show 2

[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3


  • Normalize your data in inference time same what you did on the training set

    img = np.array([img]) / 255

    Check this answer (Inference) for more details.

    Based on your 3rd comment, here are some details.

    def input_prepare(img):            
        img = cv2.resize(img, (28, 28))   
        img = cv2.bitwise_not(img)   
        img = tf.cast(tf.divide(img, 255) , tf.float64)              
        img = tf.expand_dims(img, axis=0)   
        return img 
    img = cv2.imread('/content/1.png')
    orig = img.copy() # save for plotting later on 
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # gray scaling 
    img = input_prepare(img)
    plt.imshow(tf.reshape(img, shape=[28, 28]))

    enter image description here

    plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))

    enter image description here

    It works as expected. But because of resizing the image, the digits get broken and lose their spatial information. That seems ok for the model but if it gets much worse, then the model will predict wrong. A case examples

    enter image description here

    and the model predicts wrong for this.

    plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))

    enter image description here

    To fix this we can apply cv2.erode to add some pixel after resizing, for example

    def input_prepare(img):            
        img = cv2.resize(img, (28, 28))   
        img = cv2.erode(img, np.ones((2, 2)))
        img = cv2.bitwise_not(img)   
        img = tf.cast(tf.divide(img, 255) , tf.float64)              
        img = tf.expand_dims(img, axis=0)   
        return img 

    enter image description here

    Not the best approach perhaps but now the model will understand better.

    plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))

    enter image description here