How to avoid training loss to increase dramatically in first 100 steps of object detection with accuracy of -1?

I am trying to configure transfer learning on a simple object detection problem using tensorflow and the object detection api. When conducting the training, the initial loss can be quite good, but it drastically increases (eg. 0.043 to 1691411200) in the first 100 steps and then slowly decreases. When I do inference on the data used for training i get no bounding box on the object.

Using the following scripts to create eval and training data https://github.com/douglasrizzo/detection_util_scripts I setup the data for a simple one class detection problem. I have uploaded an example image and label as tensorflow record here: https://ufile.io/eimarmj0

I am supicious that the labeling is wrong (partly because tensorboard does not show any bouding boxes, even for the ground truth) and have tried most sensible and some insensible configurations for xmin,ymin,xmax,ymax but all get the same training pattern.

The model isthe ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03 from the model zoo.

Pipeline is here: https://gist.github.com/vlschmidt/522f4efd8d62f6488eaf1d59ee098be4

Tensorflow version: '1.14.0'

How can I troubleshoot this error, where should I be looking for documentation?

Solution

Few steps which will help you to debug

Make sure your input training images does not contain any spaces in their naming convention, for example image name can be "cat1.jpg" but it cannot be"cat 1.jpg", "cat1 .jpg" etc.
Make sure your image file is not corrupted, try to open each image with 'cv2.imread()' if that dosent read your image then that image is cerainly corrupted
Start annotating the images only when point 1 and 2 are seriously considered
Make sure while generating tf record you donot get any errors.
Make sure each image had width x height > 300x300
Recheck path in fine_tune_checkpoint: ".../pathto/model.ckpt",label_map_path: ".../pathto/pbtxtxt_input.pbtxt", input_path: ".../pathto/ttt_tensorm_train.record" make sure this all is correct

Run your code for at least 1000 iterations before taking an inference