Search code examples
tensorflowkerasdata-augmentation

How to control the amount of the augmented training data with the augmentation layers integrated into the model


As you may know, recent versions of tensorflow/keras allowed the data augmentation layers integrated into the model. This feature of the API is an excellent option, especially when you want to apply image augmentation on a part of inputs (image) for a model with multimodal inputs and different sub-networks for different inputs. And the test accuracy with this augmentation increased to 3-5% in comparison with no augmentation.

But I can't figure out how many training samples were used in the actual training with this augmentation method. For simplicity, let's assume I am passing a list of numpy arrays as the inputs of the model when fitting the model. For example, if I have 1000 training cases for a model with the augmentation layers, will 1000 training cases with transformed images be used in training? If not, how many?

I tried to search all related sites (tutorials and documentation) for an answer to this simple question in vain.


Solution

  • I think I found the answer. Based on the training log of the model, the augmentation layers do not produce additional images but randomly transform the original images. To increase generated data amount, a user has to provide multiple copies of original training data as input to the model.