I would like to supply to a network many training images that are sampled from a dataset by following certain sampling rules. Now I have two choices:
Use the sampling logic to generate a list of images offline, then convert the .lst file to .rec file and use an sequential DataIter to access it.
Write my own child class of DataIter that can sample the images online. As a result, the class need to support random access, maybe inheriting from MXIndexedRecordIO. I will need to create a .rec file for the original dataset.
My intuition tells me that sequential access will be faster than random access for a .rec file. But I don't know if the difference is big enough to worth the additional time I spend in writing and testing my own iterator class. Could anyone give me a hint on this?
In your case you are better off prepacking images using MXRecordIO. It will give you a boost of performance and also introduce consistency in how you handle the dataset.
It will store the files in a .rec file as a list, where order matters
You can then use mxnet.image.ImageIter to iterate over .rec in order.