Search code examples
pythontensorflowkerasartificial-intelligence

imagedatagen.flow_from_directory() - random test set predictions?


I load my train and test sets for a CNN binary classification problem through ImageDataGenerator() as follows:

datagen_train = ImageDataGenerator(validation_split=0.2, zoom_range = 0.2, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip = True,
datagen_test = ImageDataGenerator()

train_it = cnn2_datagen.flow_from_directory(TRAIN_FOLDER, class_mode='binary', batch_size=32, target_size=(150,150), subset='training')
val_it = cnn2_datagen.flow_from_directory(TRAIN_FOLDER, class_mode='binary', batch_size=32, target_size=(150,150), subset='validation')
test_it = cnn2_datagen_test.flow_from_directory(TEST_FOLDER, class_mode='binary', batch_size=32, target_size=(150,150))

I then create my model and fit the train and val dataset on it. Afterwards, using model.evaluate(test_it) returns an accuracy of 88%. The problem however, arises when using model.predict(test_it) Even though I use the exact same test set, the output predictions are always a different order! For example:

y_pred = model.predict(test_it)

print(y_pred)

[0, 0, 1, 1]

I then run the same codeblock again and the result of model.predict(test_it) is [1, 0, 0, 1] . This happens every single time without changing anything in my code and blocks me from creating a confusion matrix as I can't compare the true label of each data point in the test set with y_pred because it is a different order of predictions.

Any advice as to why this is happening will be appreciated.


Solution

  • set suffle=False in flow_from_directory for the test_it code