Search code examples
apiobjecttensorflowdetectiontraining-data

Train Tensorflow with my own images successfully, but still have problems


I am using ubuntu 16.04, with GPU Geforce 1080, 8 GB GPU memory.

I have properly created TF-record files, and I trained the model successfully. However I still have two problems.

I did the following steps and I still have two problems, just tell me please what I am missing:-

I used VOCdevkit and I properly created two files which are:- pascal_train.record and pascal_val.record

Then,

1- From this link, I used the raccoon images, I placed them into the following directory models/object_detection/VOCdevkit/VOC2012/JPEGImages (after I deleted the previous images).

Then, I used the raccoon annotation, I placed them into the following directory models/object_detection/VOCdevkit/VOC2012/Annotation (after I deleted the previous ones).

2- I modified the models/object_detection/data/pascal_label_map.pbxt and I wrote one class name which is 'raccoon'

3- I used ssd_mobilenet_v1_pets.config. I modified it, the number of class is only one and I did not train from scratch, I used ssd_mobilenet_v1_coco_11_06_2017/model.ckpt

   fine_tune_checkpoint: "/home/jesse/abdu-py2/models/model/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"

  from_detection_checkpoint: true

4- From this link I arrange my data structure which is like that:-

  1. models

    1.1 model

     1.1.1 ssd_mobilenet_v1_pets.config
    
     1.1.2 train
    
     1.1.3 evaluation
    
     1.1.4 ssd_mobilenet_v1_coco_11_06_2017/model.ckpt
    

    1.2 object_detection

    1.2.1 data that contains (pascal_train.record, pascal_val.record, and pascal_label_map.pbtxt)

    1.2.2 VOCdevkit

    1.2.2.1 VOC2012
    
       1.2.2.1.1 JPEGImages (my own images)
    
          1.2.2.1.2 Annotations (raccoon annotation)
          1.2.2.1.3 ImageSets
            1.2.2.1.3.1 Main (raccoon_train.txt,raccoon_val.txt,raccoon_train_val.txt)       
    

5- Now, I will train my model

(abdu-py2) jesse@jesse-System-Product-Name:~/abdu-py2/models$ python object_detection/train.py --logtostderr --pipeline_config_path=/home/jesse/abdu-py2/models/model/ssd_mobilenet_v1_pets.config --train_dir=/home/jesse/abdu-py2/models/model/train

Every thing looks fine, I created it many files like checkpoint and events.out.tfevents.1503337171 file (and others) after many thousands of training steps.

However, my two problems are:-

1- Based on this link, I can not run evaluation eval.py (for memory reason) at the same time with train.py.

2- I tried to use events.out.tfevents.1503337171 file that I created from training steps, but it seems it has not been created correctly.

So, I don't know where I am mistaken, I think my data structure is not correct, I tried to arrange it based on my understanding.

Thanks in advance

Edit:-

Regarding Q2/

I figured it out how to convert the events files and model.ckpt files (that I created them from training process) to inference_graph_.pb . The inference_graph_.pb could be tested later with object_detection_tutorial.ipynb. For my case I tried it, but I could not detect anything since I am mistaken somewhere during train.py process.

The following steps convert the trained files to .pb files

(abdu-py2) jesse@jesse-System-Product-Name:~/abdu-py2/models$ python object_detection/export_inference_graph.py \

--input_type image_tensor  \

 --pipeline_config_path /home/jesse/abdu-py2/models/model/ssd_mobilenet_v1_pets.config \

--trained_checkpoint_prefix /home/jesse/abdu-py2/models/model/train/model.ckpt-27688 \

 --output_directory /home/jesse/abdu-py2/models/model

Solution

  • Question 1 - this is just a problem that you'll encounter because of your hardware. Once you get to a point where you'd like to a evaluate the model, just stop your training and run your eval command (it seems as though you've successfully evaluated your model, so you know the command). It will provide you a some metrics for the most recent model checkpoint. You can iterate through this process until you're comfortable with the performance of your model.

    Question 2 - These event files are used as input into Tensorboard. The events files are in binary format, thus are not human readable. Start a Tensorboard application while your model is training and/or evaluating. To do so, run something like this:

    tensorboard --logdir=train:/home/grasp001/abdu-py2/models/object_detection/train1/train,eval:/home/grasp001/abdu-py2/models/object_detection/train1/eval

    Once you have Tensorboard running, use your web browser to navigate to localhost:6006 to check out your metrics. You can use this during training as well to monitor loss and other metrics for each step of training.