In the TensorFlow Object Detection API, they advocate sharding if the dataset contains "more than a few thousand examples", noting that:
A few thousand is a bit vague, and it would be nice to have a more precise answer, such as a file size. In other words, how big can a .record file before it starts causing performance issues? What file size should we aim for when sharding our data?
It seems like the TensorFlow team recommends ~100MB shards. https://www.tensorflow.org/guide/performance/overview You might also consider the performance implications related to batch size while training. https://www.pugetsystems.com/labs/hpc/GPU-Memory-Size-and-Deep-Learning-Performance-batch-size-12GB-vs-32GB----1080Ti-vs-Titan-V-vs-GV100-1146/