How to open and select a subset of images from .rec file in python?

I have a folder containing the following files: train.idx, train.rec, property, lfw.bin, cfp_fp.bin, agedb_30.bin

This folder contains face images.

I have already used ImageRecordIter in the following code.

However everytime that I print the first element of train_data I get a different image.

train_data = ImageRecordIter(
    path_imgrec = os.path.join(rec_path,'train.rec'),
    path_imgidx = os.path.join(rec_path, 'train.idx'),
    label_width = 2,

    data_shape  = (3,112,112 ),
    batch_size  = 10,
    shuffle     = False)

My questions are:

1) I do not know how data is usually stored in these types of files, e.g. which one contains labels. Any idea about these types of files?

2) How can I extract a subset of data to make a sample file? Also, what would be the file format (e.g. pickle file, txt file)?

Solution

You're only using the train.rec and train.idx in the ImageRecordIter so they are the only files being used. Your labels will be stored (alongside the data) in the train.rec file. You could use MXIndexedRecordIO to extract random samples from these files. Something like:

samples = []
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
for i in range(5):
    samples.append(record.read_idx(i))

record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
for i, sample in enumerate(samples):
    record.write_idx(i, sample)
record.close()