python tensorflow computer-vision object-detection-api mask-rcnn

Error with pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model using TensorFlow Object Detection API

I am attempting to fine-tune a pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model using the TensorFlow Object Detection API for a custom task. I have downloaded the model from this location.

I have created a pipeline configuration for this model, specifying my training and evaluation TFRecord datasets and the path to the downloaded checkpoint as the fine_tune_checkpoint.

However, when I run the model_main_tf2.py script to initiate the training, I encounter an error stating that some variables from the checkpoint are missing in the model. The error is as follows:

Traceback (most recent call last): File "/content/models/research/object_detection/model_main_tf2.py", line 114, in <module> tf.compat.v1.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 605, in train_loop load_fine_tune_checkpoint( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 398, in load_fine_tune_checkpoint raise ValueError('Checkpoint version should be V2') ValueError: Checkpoint version should be V2

This error suggests that there is a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. However, as far as I can see, my pipeline configuration is correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.

Furthermore, I have inspected the checkpoint file using the inspect_checkpoint.py script and it seems to include all the variables expected for this model. The downloaded checkpoint files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint.

I am running this on Google Colab with TensorFlow version 2.12.0 and Python version 3.10.0. I would greatly appreciate any guidance or solutions to this problem.

To Reproduce

Steps to reproduce the behavior:

Download the pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model from the TensorFlow Model Zoo.
Set up a custom training pipeline configuration, specifying the path to the downloaded checkpoint in the fine_tune_checkpoint field.
Run the model training script (model_main_tf2.py).
The error appears indicating some variables from the checkpoint are not found in the model.

Expected behavior

I expect the model training to begin by loading weights from the specified pre-trained model. The error seems to suggest a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. Still, my pipeline configuration appears to be correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.

Additional context

Upon inspecting the checkpoint file with inspect_checkpoint.py, it does appear to contain all the expected variables for a Mask R-CNN Inception ResNet V2 1024x1024 model. I also confirmed that the downloaded files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint. Yet, the issue persists. Any guidance or solutions to this problem would be greatly appreciated.

I have attached my pipeline.config file below: pipeline.txt

System information

OS Platform and Distribution: Google Colab
TensorFlow installed from (source or binary): source
TensorFlow version: 2.12.0
Python version: 3.10.0
GPU model and memory: A100; 40GB

Solution

Add this following flag to train_config: section below your fine_tune_checkpoint: entry

fine_tune_checkpoint_version: V2 fine_tune_checkpoint_type: "detection"

Also note there seems to be an open issue with this model -

https://github.com/tensorflow/models/issues/9546