Search code examples
tensorflowmachine-learningkerasimage-classification

Good accuracy but bad prediction on keras mode


I am pretty new to Machine Learning, and I am trying to use Google Colab with Tensorflow/Keras to train an image classification model using transfer learning (Resnet50).

I started by using image datasets, using the following code:

data_root = '/tmp/OCT2017'
 
batch_size = 32
img_height = 160
img_width = 160
 
data_train = tf.keras.preprocessing.image_dataset_from_directory(data_root + '/train',labels='inferred',
                                                                 image_size=(img_height,img_width),
                                                                 batch_size=batch_size)

For small testing datasets, this worked pretty well, and I got both good accuracy and good predictions. But while trying to use larger datasets, all the RAM provided by Colab was consumed, so I switched to generators, using:

data_generator = tf.keras.preprocessing.image.ImageDataGenerator()

data_train_gen = data_generator.flow_from_directory(data_root + '/train',
                                                target_size=(img_height,img_width),
                                                class_mode='sparse',
                                                batch_size=batch_size,
                                                shuffle=False)

and trained the model using:

base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

with tf.device('/device:GPU:0'):
  epochs = 10
  history =model.fit(
    data_train_gen,
    validation_data=data_val_gen,
    epochs=epochs,
    callbacks=[csv_logger]
  )

I got good accuracy using this setup:

model.evaluate(data_test)

31/31 [==============================] - 3s 93ms/step - loss: 0.0925 - accuracy: 0.9742

[0.09248838573694229, 0.9741735458374023]

However, when asking for predictions, in order to make a confusion matrix, I got awful results

from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

y_pred = model.predict(data_test)
predicted_categories = tf.argmax(y_pred, axis=1)
true_categories = tf.concat([y for x, y in data_test_gen], axis=0)

cm = confusion_matrix(predicted_categories, true_categories)

heatmap = sns.heatmap(cm, annot=True, cmap='YlGn', xticklabels=['CNV','DME','DRUSEN','NORMAL'],yticklabels=['CNV','DME','DRUSEN','NORMAL'])
plt.xlabel("True Labels")
plt.ylabel("Predictions")
plt.show()

The predictions were around 40% correct The confusion matrix appeared completely random

classification_report(true_categories, predicted_categories, target_names=class_names, output_dict=True)

{'CNV': {'f1-score': 0.256198347107438, 'precision': 0.256198347107438, 'recall': 0.256198347107438, 'support': 242}, 'DME': {'f1-score': 0.23236514522821577, 'precision': 0.23333333333333334, 'recall': 0.23140495867768596, 'support': 242}, 'DRUSEN': {'f1-score': 0.25311203319502074, 'precision': 0.25416666666666665, 'recall': 0.25206611570247933, 'support': 242}, 'NORMAL': {'f1-score': 0.2827868852459016, 'precision': 0.2804878048780488, 'recall': 0.28512396694214875, 'support': 242}, 'accuracy': 0.256198347107438, 'macro avg': {'f1-score': 0.256115602694144, 'precision': 0.25604653799637167, 'recall': 0.256198347107438, 'support': 968}, 'weighted avg': {'f1-score': 0.256115602694144, 'precision': 0.2560465379963717, 'recall': 0.256198347107438, 'support': 968}}


Solution

  • You do not show your test generator but make sure you set shuffle=False. What is the difference between data_test and data_test_gen? If you have a test generator called data_test you can get the true labels from

    true_categories=data_test.labels
    

    Then use these in the confusion matrix and classification report