tensorflow validation machine-learning keras image-classification

Identify misclassified images with Tensorflow

I have been working on an image classifier and I would like to have a look at the images that the model has misclassified in the validation. My idea was to compare the true and predicted values and used the index of the values that didn't match to get the images. However, when I tried to compare the accuracy I don't get the same result I got when I use the evaluate method. This is what I have done:

I import the data using this function:

def create_dataset(folder_path, name, split, seed, shuffle=True):
  return tf.keras.preprocessing.image_dataset_from_directory(
    folder_path, labels='inferred', label_mode='categorical', color_mode='rgb',
    batch_size=32, image_size=(320, 320), shuffle=shuffle, interpolation='bilinear',
    validation_split=split, subset=name, seed=seed)

train_set = create_dataset(dir_path, 'training', 0.1, 42)
valid_set = create_dataset(dir_path, 'validation', 0.1, 42)

# output:
# Found 16718 files belonging to 38 classes.
# Using 15047 files for training.
# Found 16718 files belonging to 38 classes.
# Using 1671 files for validation.

Then to evaluate the accuracy on the validation set I use this line:

model.evaluate(valid_set)

# output:
# 53/53 [==============================] - 22s 376ms/step - loss: 1.1322 - accuracy: 0.7349
# [1.1321837902069092, 0.7348892688751221]

which is fine since the values are exactly the same I got in the last epoch of training.

To extract the true labels from the validation set I use this line of code based on this answer. Note that I need to create the validation again because every time I call the variable that refers to the validation set, the validation set gets shuffled. I thought that it was this factor to cause the inconsistent accuracy, but apparently it didn't solve the problem.

y_val_true = np.concatenate([y for x, y in create_dataset(dir_path, 'validation', 0.1, 42)], axis=0)
y_val_true = np.argmax(y_val_true, axis=1)

I make the prediction:

y_val_pred = model.predict(create_dataset(dir_path, 'validation', 0.1, 42))
y_val_pred = np.argmax(y_val_pred, axis=1)

And finally I compute once again the accuracy to verify that everything is ok:

m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()

# output:
# 0.082585275

As you can see, instead of getting the same value I got when I ran the evaluate method, now I get only 8%.

I would be truly grateful if you could point out where my approach is flawed. And since the my first question I post, I apologize in advance for any mistake I made.

Solution

This method can help provide giving insights if you want to display or analyse batch-by-batch

m = tf.keras.metrics.Accuracy()

# Iterating over individual batches to keep track of the images
# being fed to the model.
for valid_images, valid_labels in valid_set.as_numpy_iterator():
    y_val_true = np.argmax(valid_labels, axis=1)

    # Model can take inputs other than dataset as well. Hence, after images
    # are collected you can give them as input.
    y_val_pred = model.predict(valid_images)
    y_val_pred = np.argmax(y_val_pred, axis=1)
   
    # Update the state of the accuracy metric after every batch
    m.update_state(y_val_true, y_val_pred)

m.result().numpy()

If you want to feed altogether

valid_ds = create_dataset(dir_path, 'validation', 0.1, 42, shuffle=False)
y_val_true = np.concatenate([y for x, y in valid_ds, axis=0)
y_val_true = np.argmax(y_val_true, axis=1)
y_val_pred = model.predict(valid_ds)
y_val_pred = np.argmax(y_val_pred, axis=1)

m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()

I couldn't find the bug in your code though.