Search code examples
image-processingkerasgenerator

Why does keras test generator only return batch size as the length in the shape of the array?


Here is my test generator code:

test_generator=test_datagen.flow_from_dataframe(
                      dataframe=df_test,
                      directory=img_dir,
                      x_col="filename",
                      y_col="label",
                      batch_size=32,
                      seed=42,
                      shuffle=False,
                      class_mode="categorical",
                      target_size=(img_size,img_size))

Why does batch_size parameter still matter after you create the generator:

Found 229 validated image filenames belonging to 2 classes.

For example, the shape of the array after the generator is created is limited to 32 - the batch size:

x_test, y_test = test_generator.next()

here is the shape of x_test, I'm assuming this is the array with the actual image data:

>>> print(x_test.shape)
(32, 224, 224, 3)

This is the result when I compare it to the length of the predictions:

print(len(x_test))  #32
print(len(y_test))  #32
print(len(pred))    #229

Since the size of the y_test is vastly different than the predictions, I'm having difficulty doing any sort of comparisons. The y_test is directly related to the test_generator that has the batch size set at 32.

The test generator labels seems to have the right number of elements:

test_generator.labels

[0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0........

So why is the shape of x_test only 32? I am obviously thinking incorrectly that it should be 229, since there are 229 samples, 229 labels?


Solution

  • As the docs here state, what a generator returns is:

    A DataFrameIterator yielding tuples of (x, y) where x is a numpy array containing a batch of images with shape (batch_size, target_size, channels) and y is a numpy array of corresponding labels.

    So, test_generator is a DataFrameIterator, which each time you call it, it will give you a batch of images with shape (32, 224, 224, 3). Therefore you are thinking incorrectly that it should be 229, since there are 229 samples. Each time it will give you a batch of 32 images out of 229 sample.