Is there a standard way of encoding multiple records (in this case, data from multiple .png or .jpeg images) in one file that PyTorch can read? Something similar to TensorFlow's "TFRecord" or MXNet's "RecordIO", but for PyTorch.
I need to download image data from S3 for inference, and it's much slower if my image data is in many small .jpg files rather than fewer files.
Thanks.
One thing is to store batches of images together in a single npz
file. Numpy's np.savez
lets you save multiple arrays compressed into a single file. Then load the file as np arrays and use torch.from_numpy
to convert to tensors.