Search code examples
tensorflowclassificationmulticlass-classification

How to train on 70k images using Tensorflow


I am new to tensorflow and machine learning. I have a training set of 55k images which are divided into 40 different categories. Some categories have ~2000 images while others have ~20k images. Each image is 1080x1440 in resolution.

I am retraining these images using tensorflow inceptionV3 Imagenet but the results that I am getting are not good. The program is not able to classify the images properly. The accuracy of the expected label is very low in all most all the images of the test set.

For retraining, my command is -

python retrain.py --image_dir=train_images --how_many_training_steps=4000 --output_graph=output_graph.pb --output_labels=output_labels.txt --bottleneck_dir=bottlenecks --saved_model_dir=saved_models

I am not using other parameters like scaling, crop, test batch size, validation batch size etc as I am not familiar on how to use them.

For labeling, my command is -

python label_image.py --graph=output_graph.pb --labels=output_labels.txt  --input_layer=Placeholder --output_layer=final_result --image=51.jpg

Can someone help me on how to use these input parameters for the best results?

Thanks in advance!


Solution

  • This question is super broad but a couple of things.

    1. I'd suggest using Keras with the TensorFlow backend as the abstraction makes it easier to understand what is going on. There's also a ton of examples that you can find when using the Keras framework.

    2. Keras has some utility classes that help with loading large amounts of data that won't fit into memory. With many classes to predict using the default ImageDataGenerator may be impractical because ImageDataGenerator gets the label from the directory in which the image file resides. (Meaning with 50 classes you need a train data folder with 50 child folders as well as test data folders with 50 child folders. Then duplicates of these if you're doing cross validation.)

    More info here:

    https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

    1. If you don't make folders then you need to make your own generator that loads files and also returns the label of those files.

    For this scenario I recommend looking at this:

    https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly.html