Search code examples
imagetensorflowdeep-learninggoogle-colaboratorytpu

RuntimeError: Failed to serialize message


Im trying to use TPU in google colab so after doing some preprocess in numpy format im trying to convert into tensorflow format by using train_setx = tf.data.Dataset.from_tensor_slices(trainx)

which has 90k images of size 225*225*1. if i try to run this code im getting this error

RuntimeError: Failed to serialize message`

but if i convert another dataset of 10k images it worked and i saw in github that its because of large dataset. is it so? then how to convert my dataset which has 90k image?


Solution

  • That is because of the availability of the Memory.

    If you want to train the model, you can directly use the numpy arrays. Else, you can alternatively use TFRecord. TFRecord helps to read data efficiently, it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing. The TFRecord format is a simple format for storing a sequence of binary records.

    If you are working with large datasets, using a binary file format for storage of your data can have a significant impact on the performance of your import pipeline and as a consequence on the training time of your model. Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.

    Please go through this documentation on reading and writing the images with TFRecord - Walkthrough: Reading and writing image data.

    Hope this answers your question. Happy Learning.