Should a TFRecord contain multiple observations or one?

I see in explanation a TFRecord contains multiple classes and multiple images (a cat and a bridge). When it was written, both images are written into one TFRecord. During the read back, it is verified that this TFRecord contains two images.

Elsewhere I have seen people generating one TFRecord per image, I know you can load multiple TFRecord files like this:

train_dataset = tf.data.TFRecordDataset("<Path>/*.tfrecord")

But which way is recommended? should I build one tfrecord per image, or one tfrecord for multiple images? If put multiple images into one tfrecord, then how many is maximum?

Solution

As you said, it is possible to save an arbitrary amount of entries in a single TFRecord file, and one can create as many TFRecord files as desired.

I would recommend using practical considerations to decide how to proceed:

On one hand, try to use fewer TFRecord files for easier handling moving files in the filesystem
On the other hand, avoid growing TFRecord files to a size that can become a problem for filesystem
Keep in mind that it is useful to keep separate TFRecord files for train / validation / test split
Sometimes the nature of the dataset makes it obvious how to split into separate files (for example, I have a video dataset where I use one TFRecord file per participant session)