I'm trying to retrain existing pretrained net from object-detection-API. It is ssd_mobilenet_v2. Pre-trained on COCO dataset. I was reproducing steps according to the tutorial pinned to obj-detection-API.
The model starts training anyway, but the % mAP is low. I'm new to CNN's at all, so any help is appreciated.
When I start training, then this warning appears and I can't find a fix.
I'm running it in a google-collaboratory notebook with this command
# Training
!python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=${NUM_TRAIN_STEPS} \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--alsologtostderrps
this are the warnings I get:
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.
after running like 10 minutes it prints out this:
Accumulating evaluation results...
DONE (t=1.73s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.040
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.026
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.050
I haven't changed the *.ckpt files just downloaded the original pretrained version of ssd_mobilenet_v2_coco_2018_03_29 and used these and linked them in the .config file.
I'm trying to figure it out for more than a day. Thank you for help.
I recently ran into the same issue as Miroslav (exact same 4 warning messages). While @GPhilo is right that this warning message means that the checkpoint doesn't match the model, it appears that there was an issue generating this specific pre-trained checkpoint. Specifically, the ssd_mobilenet_v2_coco_2018_03_29.tar.gz
checkpoint seems to have been generated using a pre-release version of the config file. Here is the link to the related issue on GitHub:
https://github.com/tensorflow/models/issues/5315
In the end, I switched from the ssd_mobilenet_v2_coco.config
file in the tensorflow/models git repo to the pipeline.config
file included with the pre-trained checkpoint. Besides the normal settings that need changing, you also need to remove the batch_norm_trainable
flag. More info on this bug is here:
https://github.com/tensorflow/models/issues/4066
Note: My first attempt was to switch to the quantized version of MobileNet V2 SSD, but I didn't get the accuracy that I hoped for after re-training the model with my data set (not sure why).