I have a previously exported a RetinaNet model (originally from the object detection zoo) that has been fine tuned on a custom dataset with the Tensorflow Object Detection API (Tensorflow version 2.4.1). Below is how the exported model's folder looks.
When running the evaluation (like below) on the model it has a [email protected] of 0.5.
python model_main_tf2.py --model_dir=exported-models/retinanet --pipeline_config_path=exported-models/retinanet/pipeline.config --checkpoint_dir=exported-models/retinanet/checkpoint
Due to unfortunate circumstances, I do not have the training folder from when the model was trained. As I recently got more data, I would like to use the exported model as a starting point for further training and have set the fine_tune_checkpoint: "exported-models/retinanet/checkpoint/ckpt-0"
in the pipeline.config
for the new training:
fine_tune_checkpoint: "exported-models/retinanet/checkpoint/ckpt-0"
num_steps: 25000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
use_bfloat16: false
fine_tune_checkpoint_version: V2
However, when starting the training with the model_main_tf2.py
script, the first checkpoint (which is at step 0) has a terrible score - even on the same dataset that the evaluation was run on for the exported model.
I would expect the first checkpoint to have the same score (at least for the same test-set) as the exported model's score. Is this wrong to assume and in that case why?
I finally found the following here:
// Whether to load all checkpoint vars that match model variable names and
// sizes. This option is only available if `from_detection_checkpoint` is
// True. This option is *not* supported for TF2 --- setting it to true
// will raise an error. **Instead, set fine_tune_checkpoint_type: 'full'.**
optional bool load_all_detection_checkpoint_vars = 19 [default = false];
By setting fine_tune_checkpoint_type
to "full", I got the correct mAP for the first checkpoint (at 0 steps).