Search code examples
imagepytorchrecordtfrecord

PyTorch: wrapping multiple records in one file?


Is there a standard way of encoding multiple records (in this case, data from multiple .png or .jpeg images) in one file that PyTorch can read? Something similar to TensorFlow's "TFRecord" or MXNet's "RecordIO", but for PyTorch.

I need to download image data from S3 for inference, and it's much slower if my image data is in many small .jpg files rather than fewer files.

Thanks.


Solution

  • One thing is to store batches of images together in a single npz file. Numpy's np.savez lets you save multiple arrays compressed into a single file. Then load the file as np arrays and use torch.from_numpy to convert to tensors.