Search code examples
pythonpython-3.xtensorflowmachine-learningobject-detection-api

Tensorflow custom Object Detector: model_main_tf2 doesn't start training


Problem summary: The tensorflow custom object detector never starts fine-tuning when i follow the guide in docs. It doesn't throw an exception either.

What i've done: I have installed the object detector api and run a succesful test as according to the docs.

I then followed the guide about training a custom object detector algorithm here, including modifying the pipeline.config file. As per the guide i run

model_main_tf2.py  --model_dir=<path1> --pipeline_config_path=<path2> --alsologtostderr

where path1 and path2 are paths like

 D:/COCO/models/workspace/duck-demo/pre-trained-models/efficientdet_d1_coco17_tpu-32/pipeline.config

The output is shown below. The output, including its many warnings, is expected output as per the guide. However, it was expected to start training afterwards. Instead it just returns, without error nor training. What seems to be the problem here?

output:

...
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.gamma
W0326 09:24:46.180965 16300 util.py:160] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.gamma
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.beta
W0326 09:24:46.180965 16300 util.py:160] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.beta
...
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. 
Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0326 09:24:46.181965 16300 util.py:168] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. 
Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

Solution

  • There is a GitHub issue here with many possible solutions being discussed for different types of TensorFlow 2 models for your problem. There's a good chance one of them would help.

    Just as a rule of thumb, it's a good idea to always test your installation by running the command python object_detection/builders/model_builder_tf2_test.py before actually proceeding to train anything to diagnose any possible issues early