I have a folder containing the following files: train.idx, train.rec, property, lfw.bin, cfp_fp.bin, agedb_30.bin
This folder contains face images.
I have already used ImageRecordIter in the following code.
However everytime that I print the first element of train_data I get a different image.
train_data = ImageRecordIter(
path_imgrec = os.path.join(rec_path,'train.rec'),
path_imgidx = os.path.join(rec_path, 'train.idx'),
label_width = 2,
data_shape = (3,112,112 ),
batch_size = 10,
shuffle = False)
My questions are:
1) I do not know how data is usually stored in these types of files, e.g. which one contains labels. Any idea about these types of files?
2) How can I extract a subset of data to make a sample file? Also, what would be the file format (e.g. pickle file, txt file)?
You're only using the train.rec
and train.idx
in the ImageRecordIter
so they are the only files being used. Your labels will be stored (alongside the data) in the train.rec
file. You could use MXIndexedRecordIO
to extract random samples from these files. Something like:
samples = []
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
for i in range(5):
samples.append(record.read_idx(i))
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
for i, sample in enumerate(samples):
record.write_idx(i, sample)
record.close()