python tensorflow machine-learning tensorflow-datasets

Tensorflow tf.dataset.shuffle very slow

I am training a VAE model with 9100 images (each of size 256 x 64). I train the model with Nvidia RTX 3080. First, I load all the images into a numpy array of size 9100 x 256 x 64 called traindata. Then, to form a dataset for training, I use

train_dataset = (tf.data.Dataset.from_tensor_slices(traindata).shuffle(len(traindata)).batch(batch_size))

Here I use a batch_size of 65. I mainly have 2 questions about the things that I see during training:

Question 1:

According to the docs, the whole dataset is being re-shuffled for every epoch. However, the training is very slow in this way (around 50 seconds per epoch). I did a comparison with a training without shuffle by not calling .shuffle(len(traindata)) when creating the dataset, and the training is much faster (around 20s/epoch). I am wondering why the .shuffle() operation is so slow and if there's any methods to make it faster? According to this StatsSE thread, shuffling is quite important for training and that's why I include the shuffle operation.

Question 2:

When I called .shuffle() when creating the dataset, Tensorflow always gives the following message

I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 4294967295

I try to search online but still cannot understand the meaning behind this. Does this mean there's some error or it's just a warning that I can ignore?

Solution

That's because holding all elements of your dataset in the buffer is expensive. Unless you absolutely need perfect randomness, you should use a smaller buffer_size. All elements will eventually be taken, but in a more deterministic manner.

This is what's going to happen with a smaller buffer_size, say 3. The buffer is the brackets, and Tensorflow samples a random value in this bracket. The one randomly picked is ^

1) [1 2 3]4 5 6 7 8 9 
      ^
2) [1 3 4]5 6 7 8
        ^
3) [1 3 5]6 7 8
        ^
4) [1 3 6]7 8
    ^
5) [3 6 7]8

Etc

So, earlier values will be taken earlier in your epoch, but you will still have some shuffling done, and all samples will eventually be taken.

tl;dr reduce buffer_size by a lot