Search code examples
amazon-web-servicesamazon-ec2tensorflowdeep-learningbazel

Tensorflow inception retraining : bottleneck files creation


I'm following the tutorial to retrain the inception model adapted to my own problem. I have about 50 000 images in around 100 folders / categories.

Running this

bazel build tensorflow/examples/image_retraining:retrain

bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir /path/to/root_folder_name

on Amazon EC2 g2.2xlarge I was hoping that the full process would be quite fast (faster than on my laptop) but the bottleneck files creation takes a long time. Assuming it's already been 2 hours and only 800 files have been created, I will need more than 5 days (!!) to just create the files...

Is it supposed to be faster than this rythm ( ~ 400 bottleneck files created / hour) because of the GPU ?

How could I make the process faster ?


Solution

  • Finally found the answer to my question.

    Bazel was working without GPU support. To solve this, I modified files regarding these issues :

    and ran

    TF_UNOFFICIAL_SETTING=1 ./configure

    bazel build -c opt --config=cuda tensorflow/examples/image_retraining:retrain --verbose_failures

    bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir ~/Images/

    At the end of the day, the process was a lot faster (500 images / second) and the training itself was also done with the GPU !