Here is my test generator code:
test_generator=test_datagen.flow_from_dataframe(
dataframe=df_test,
directory=img_dir,
x_col="filename",
y_col="label",
batch_size=32,
seed=42,
shuffle=False,
class_mode="categorical",
target_size=(img_size,img_size))
Why does batch_size
parameter still matter after you create the generator:
Found 229 validated image filenames belonging to 2 classes.
For example, the shape of the array after the generator is created is limited to 32 - the batch size:
x_test, y_test = test_generator.next()
here is the shape of x_test
, I'm assuming this is the array with the actual image data:
>>> print(x_test.shape)
(32, 224, 224, 3)
This is the result when I compare it to the length of the predictions:
print(len(x_test)) #32
print(len(y_test)) #32
print(len(pred)) #229
Since the size of the y_test
is vastly different than the predictions, I'm having difficulty doing any sort of comparisons. The y_test
is directly related to the test_generator
that has the batch size set at 32.
The test generator labels seems to have the right number of elements:
test_generator.labels
[0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0........
So why is the shape of x_test
only 32? I am obviously thinking incorrectly that it should be 229, since there are 229 samples, 229 labels?
As the docs here state, what a generator returns is:
A DataFrameIterator yielding tuples of (x, y) where x is a numpy array containing a batch of images with shape (batch_size, target_size, channels) and y is a numpy array of corresponding labels.
So, test_generator
is a DataFrameIterator
, which each time you call it, it will give you a batch of images with shape (32, 224, 224, 3)
. Therefore you are thinking incorrectly that it should be 229, since there are 229 samples. Each time it will give you a batch of 32 images out of 229 sample.