Search code examples
tensorflowmachine-learningcomputer-visiontensorflow2.0object-detection

Tensorflow Object Detection API: Train from exported model checkpoint


I have a previously exported a RetinaNet model (originally from the object detection zoo) that has been fine tuned on a custom dataset with the Tensorflow Object Detection API (Tensorflow version 2.4.1). Below is how the exported model's folder looks.

enter image description here

When running the evaluation (like below) on the model it has a [email protected] of 0.5.

python model_main_tf2.py --model_dir=exported-models/retinanet --pipeline_config_path=exported-models/retinanet/pipeline.config --checkpoint_dir=exported-models/retinanet/checkpoint

The question

Due to unfortunate circumstances, I do not have the training folder from when the model was trained. As I recently got more data, I would like to use the exported model as a starting point for further training and have set the fine_tune_checkpoint: "exported-models/retinanet/checkpoint/ckpt-0" in the pipeline.config for the new training:

  fine_tune_checkpoint: "exported-models/retinanet/checkpoint/ckpt-0"
  num_steps: 25000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: false
  fine_tune_checkpoint_version: V2

However, when starting the training with the model_main_tf2.py script, the first checkpoint (which is at step 0) has a terrible score - even on the same dataset that the evaluation was run on for the exported model.

I would expect the first checkpoint to have the same score (at least for the same test-set) as the exported model's score. Is this wrong to assume and in that case why?


Solution

  • I finally found the following here:

    // Whether to load all checkpoint vars that match model variable names and
    // sizes. This option is only available if `from_detection_checkpoint` is
    // True.  This option is *not* supported for TF2 --- setting it to true
    // will raise an error. **Instead, set fine_tune_checkpoint_type: 'full'.**
      optional bool load_all_detection_checkpoint_vars = 19 [default = false];
    

    By setting fine_tune_checkpoint_type to "full", I got the correct mAP for the first checkpoint (at 0 steps).