python-3.x tensorflow keras training-data tensorflow2.x

Parallelize data preparation with tensorflow and keras

I'm training an LSTM model on GPU using Tensorflow Keras. Before training starts when calling model.fit() it takes about 30 minutes to start training process. I think in this period its preparing data using one core of CPU. Am I correct? and if yes how can I parallelize this data preparation process, using multiple cores?

Solution

If you use tf.data.Dataset() for preparing your data, you can exploit some parameters while preparing the dataset.

For example, in your .map() function, set the num_parallel_calls parameter to num_parallel_calls=tf.data.experimental.AUTOTUNE; in this way it will allow your program to automatically choose the number of CPU cores for the dataset preparation.
The speed can also be improved if you use the .batch().shuffle() in this order, rather than shuffle().batch() in this order. In the first situation, you take a batch of your dataset and shuffle the elements inside it; in the second case, you batch() on your dataset after the entire dataset is shuffled. One can easily see that shuffling an enormous dataset would lead to much more time spent rather than batching and shuffling in this order.
Let us see another case(related to the actual training process)(not your case as the dataset preparation is very time-consuming in your situation): the prefetch() operation. In this case, the GPU is not idling while waiting for the CPU to fetch another batch of data. Practically, when the backpropagation has finished updating the weights after a batch, the GPU immediately consumes another batch of data. For simplicity purposes, also set its value to tf.data.experimental.AUTOTUNE.