I'm training an LSTM model on GPU using Tensorflow Keras. Before training starts when calling model.fit()
it takes about 30 minutes to start training process. I think in this period its preparing data using one core of CPU. Am I correct? and if yes how can I parallelize this data preparation process, using multiple cores?
If you use tf.data.Dataset()
for preparing your data, you can exploit some parameters while preparing the dataset.
.map()
function, set the num_parallel_calls
parameter to num_parallel_calls=tf.data.experimental.AUTOTUNE
; in this way it will allow your program to automatically choose the number of CPU cores for the dataset preparation..batch().shuffle()
in this order, rather than shuffle().batch()
in this order. In the first situation, you take a batch of your dataset and shuffle the elements inside it; in the second case, you batch()
on your dataset after the entire dataset is shuffled. One can easily see that shuffling an enormous dataset would lead to much more time spent rather than batching and shuffling in this order. prefetch()
operation.
In this case, the GPU is not idling while waiting for the CPU to
fetch another batch of data. Practically, when the backpropagation
has finished updating the weights after a batch, the GPU immediately
consumes another batch of data. For simplicity purposes, also set its value to tf.data.experimental.AUTOTUNE
.