Search code examples
pythontensorflowmachine-learningimage-segmentationobject-detection-api

Unable to load pre-trained model checkpoint with TensorFlow Object Detection API


Similar to this question:

Where can I find model.ckpt in faster_rcnn_resnet50_coco model? (this solution doesn't work for me)

I have downloaded the ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8 with the intention of using it as a starting point. I am using the sample model configuration associated with that model in the TF model zoo.

I am only changing the num classes and paths for tuning, training and eval.

With:

fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\model.ckpt"

I get:

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\model.ckpt

With:

fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\ckpt-0.*"

I get:

tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad mag
ic number): perhaps your file is in a different file format and you need to use a different restore operator?

I'm currently using absolute paths because it's easiest, but if it's a problem I can re-organize my project structure.

Checkpoint Folder

The official documentation from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md says to do something like

fine_tune_checkpoint: a path prefix to the pre-existing checkpoint (ie:"/usr/home/username/checkpoint/model.ckpt-#####").

Is there something I am doing wrong here? I am running this with the following command (also from documentation):

python object_detection/model_main_tf2.py \
    --pipeline_config_path="C:\\Users\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\my_model.config" \
    --model_dir="C:\\Users\\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\training" \
    --alsologtostderr

Solution

  • Try changing the fine_tune_checkpoint path in the config file to something like path_to_folder/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0

    And in your training command, set the model_dir flag to just point to the model directory, don't include training, kind of like --model_dir=<path_to>/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8

    Source

    Just change the backslashes to forward-slashes, since you're on windows