Suppose I have something like the following:
image_data_generator = ImageDataGenerator(rescale=1./255)
train_generator = image_data_generator.flow_from_directory(
'my_directory',
target_size=(28, 28),
batch_size=32,
class_mode='categorical'
)
Then my train_generator
is filled with data from my_directory
, which contains two subfolders which separate the data into classes 0
and 1
.
Suppose also I have another directory that_directory
, also with data split into classes 0
and 1
. I want to augment my train_generator
with this additional data.
Running train_generator = image_data_generator.flow_from_directory('that_directory', ...)
removes the prior data from my_directory
.
Is there a way to augment or append both sets of data into one generator or an object that operates like a DirectoryIterator
without changing the folder structure itself?
Just combine the generators in another generator, optionally with different augmentation configs:
idg1 = ImageDataGenerator(**idg1_configs)
idg2 = ImageDataGenerator(**idg2_configs)
g1 = idg1.flow_from_directory('idg1_dir',...)
g2 = idg2.flow_from_directory('idg2_dir',...)
def combine_gen(*gens):
while True:
for g in gens:
yield next(g)
# ...
model.fit_generator(combine_gen(g1, g2), steps_per_epoch=len(g1)+len(g2), ...)
This would alternately generate batches from g1
and g2
.
Note that one might suggest using itertools.chain
, however you can't use that here since ImageDataGenerators
generators are never-ending and ceaselessly generate batches of data. This is expected for the generator you pass to fit_generator
method. From Keras doc:
...The generator is expected to loop over its data indefinitely. An epoch finishes when
steps_per_epoch
batches have been seen by the model.
The steps_per_epoch
if not set would default to len(generator)
where generator
is the generator you pass to fit_generator
method. The ImageDataGenerator
generators can give their length, so you don't need to manually set the steps_per_epoch
argument. If you would like the same thing with combined generators above, you can use this solution instead:
class CombinedGen():
def __init__(self, *gens):
self.gens = gens
def generate(self):
while True:
for g in self.gens:
yield next(g)
def __len__(self):
return sum([len(g) for g in self.gens])
# usage:
cg = CombinedGen(g1, g2)
model.fit_generator(cg.generate(), ...) # no need to set `steps_per_epoch`
You can also add __next__
and/or __iter__
methods to CombinedGen
class if you are interested to directly iterate over the objects of this class (instead of iterating over cg.generate()
).