I am training an image classifier model for detecting rodents. When I fit the model and evaluate it, I have strong metrics that indicate it's performing well. However when I do predictions with tf datasets, it seems the model is just guessing. When I predict singular images it is accurate again.
here is the function that does all the evaluating:
def evaluate_model(hist, model_fp, model, train, test):
# plot basic metrics
metrics_path, _ = os.path.splitext(model_fp)
train_labels = np.concatenate([y for x, y in train], axis=0)
test_labels = np.concatenate([y for x, y in test], axis=0)
plt.figure()
... # plot accuracy metrics from history
metrics = ['loss', 'prc', 'precision', 'recall']
... # plot more metrics from history
results = model.evaluate(test, verbose=0)
for name, value in zip(model.metrics_names, results):
print(name, ': ', value)
print('\n')
results = model.evaluate(train, verbose=0)
for name, value in zip(model.metrics_names, results):
print(name, ': ', value)
print('\n')
train_predictions = model.predict(train)
test_predictions = model.predict(test)
# PRC, ROC, CM GENERATED FROM THESE PREDICTIONS
Here is the function that grabs and batches the images
def prepare_images(dir):
training_data = image_dataset_from_directory(
dir,
validation_split=0.2,
subset="training",
image_size=IMG_SHAPE,
batch_size=BATCH_SIZE,
shuffle=True,
seed=123
)
validation_data = image_dataset_from_directory(
dir,
validation_split=0.2,
subset="validation",
image_size=IMG_SHAPE,
batch_size=BATCH_SIZE,
shuffle=True,
seed=456
)
# create test data set using 20% of validation data
val_batches = tf.data.experimental.cardinality(validation_data)
test_data = validation_data.take(val_batches // 5)
validation_data = validation_data.skip(val_batches // 5)
return training_data, validation_data, test_data
fitting the model
def train_model(model, train, val, test, epochs, cb, class_weights, model_fp):
history = model.fit(
train,
validation_data=val,
epochs=epochs,
batch_size=BATCH_SIZE,
callbacks=cb,
class_weight=class_weights,
verbose=1
)
So when I evaluate the model with the test data set the output is as follows:
loss : 0.1926884800195694
tp : 197.0
fp : 20.0
tn : 159.0
fn : 8.0
accuracy : 0.9270833134651184
precision : 0.9078341126441956
recall : 0.9609755873680115
auc : 0.979574978351593
prc : 0.9817492961883545
however when predicting with the same batch...
58/58 [==============================] - 6s 91ms/step
3/3 [==============================] - 1s 142ms/step
Legitimate Transactions Detected (True Negatives): 98
Legitimate Transactions Incorrectly Detected (False Positives): 92
Fraudulent Transactions Missed (False Negatives): 83
Fraudulent Transactions Detected (True Positives): 111
Total Fraudulent Transactions: 194
And when individually predicting 10 images, 9 of them are correct.
So why is model.predict seemingly randomly guessing when it comes to using a batch dataset, but not with individual images? Is there a work around? Thanks
Ok so I figured it out, not sure if this is an error in documentation or a bug in Tensorflow, because the docs clearly state you can use a tf.data dataset for x input, however when I do my model is just randomly guessing. I even made sure to not specify batch size as it says. The work around is to split up the data set into image data and label data as follows:
for image_batch, labels_batch in test:
x_test = image_batch.numpy()
y_test = labels_batch.numpy()
and then call the predict function with just x_test, and plot metrics with those predictions, and y_test
test_predictions = model.predict(x_test)
plot_cm(test_predictions, y_test, p, metrics_path) # function to plot confusion matrix