Search code examples
mxnet

Does MXNet read training data from S3 in a streaming fashion?


This page talks about reading training data from S3 bucket directly. Does anybody know if the data is read in a streaming fashion or if the entire training data is copied to a local cache before training begins?


Solution

  • The data is actually read in a streaming fashion. If you want to cache the entire file locally, you need to do that manually or by using a script before the training begins.

    Note that some iterators might read the entire .rec file (to get some metadata) before training begins if .lst file is not provided. It is a good idea to provide both the .rec and .lst files while creating the iterator.

    Example:

    itr = mxnet.image.ImageDetIter(batch_size=32, data_shape=(3,300,300),
                                   path_imgrec=“s3://my_bucket_name/training_data/train.rec”,
                                   path_imglist=“s3://my_bucket_name/training_data/train.lst”)