Search code examples
pythonobject-detectionmxnet

Understanding mxnet.image.ImageDetIter


I learn MXNet framework and try to run example of object detection with SSD: https://gluon.mxnet.io/chapter08_computer-vision/object-detection.html

I use GPU is NVidia GTX 1050, 4GB for training. I work in Jupyter notebook. Versions: Python 3.6, MXNet 1.3.1.

It was said in the tutorial that training from scratch takes about 30 minutes with one GPU. I stopped after 3 hours. The model had processed 24459 batches (batch has size of 32) when I interrupted training. Whole dataset has size of 87.7MB that is less than 24459*32*256*256 (size of image is 256x256). I can't understand why it may takes too much time. Are there maybe any particular features of image.ImageDetIter (for example the one does never stopped by itself)?


Solution

  • Thanks for including the version info. You're absolutely correct - there was a bug in MXNet 1.3.0 where ImageDetIter looped indefinitely on the example you had. This was fixed Dec 2018 and if you upgrade to MXNet 1.4.0 you won't see the issue. I confirmed this by running the code above.

    Another important note, "Deep Learning - The Straight Dope" has been deprecated in favor of (Dive into Deep Learning](d2l.ai). The content is updated and being used for a course on MXNet. Here's the corresponding chapter in the book.

    Additionally, videos from the course are posted here, if you wanted to watch them.

    As for the repro, I ran and confirmed that this was looping indefinitely in 1.3.x and fixed in 1.4.0.

    train_iter = image.ImageDetIter(
            batch_size=1000, 
            data_shape=(3, data_shape, data_shape),
            path_imgrec='./data/pikachu_train.rec',
            path_imgidx='./data/pikachu_train.idx',
            #shuffle=True, 
            #mean=True,
            #rand_crop=1, 
            min_object_covered=0.95,
            last_batch_handle='pad',
            max_attempts=5)
    train_iter.reset()
    for i,data in enumerate(train_iter):    
        print((i+1)) # goes forever on 1.3.0 but not 1.4.0
    

    Hope that helps,

    Vishaal