Search code examples

Deeplabv3 re-train result is skewed for non-square images

I have issues fine-tuning the pretrained model deeplabv3_mnv2_pascal_train_aug in Google Colab.

When I do the visualization with, the results appear to be displaced to the left/upper side of the image if it has a bigger height/width, namely, the image is not square.

The dataset used for the fine-tune is Look Into Person. The steps done to do so are:

  1. Create dataset in deeplab/datasets/
_LIP_INFORMATION = DatasetDescriptor(
        'train': 30462,
        'train_aug': 10582,
        'trainval': 40462,
        'val': 10000,

    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'cihp': _CIHP_INFORMATION,
    'lip': _LIP_INFORMATION,
  1. Conversion to tfrecord
!python models/research/deeplab/datasets/ \
  --image_folder="/content/drive/MyDrive/TFM/lip_trainval_images/TrainVal_images/train_images" \
  --semantic_segmentation_folder="/content/drive/MyDrive/TFM/lip_trainval_segmentations/TrainVal_parsing_annotations/train_segmentations" \
  --list_folder="/content/drive/MyDrive/TFM/lip_trainval_images" \
  --image_format="jpg" \
!python models/research/deeplab/datasets/ \
  --image_folder="/content/drive/MyDrive/TFM/lip_trainval_images/TrainVal_images/val_images" \
  --semantic_segmentation_folder="/content/drive/MyDrive/TFM/lip_trainval_segmentations/TrainVal_parsing_annotations/val_segmentations" \
  --list_folder="/content/drive/MyDrive/TFM/lip_trainval_images" \
  --image_format="jpg" \
  1. Training
!python deeplab/ --logtostderr \
  --training_number_of_steps=40000 \
  --train_split="train" \
  --model_variant="mobilenet_v2" \
  --atrous_rates=6 \
  --atrous_rates=12 \
   --atrous_rates=18 \
   --output_stride=16 \
   --decoder_output_stride=4 \
   --train_batch_size=1 \
   --dataset="lip" \
   --train_logdir="/content/drive/MyDrive/TFM/checkpoint_lip_mobilenet" \
   --dataset_dir="/content/drive/MyDrive/TFM/trainval_lip_tfrecord/" \
   --fine_tune_batch_norm=false \
   --initialize_last_layer=false \
  1. Visualization
!python deeplab/ --logtostderr \
  --atrous_rates=6 \
  --atrous_rates=12 \
   --atrous_rates=18 \
   --output_stride=16 \
   --decoder_output_stride=4 \
   --dataset="lip" \
   --checkpoint_dir="/content/drive/MyDrive/TFM/checkpoint_lip_mobilenet" \
   --vis_logdir="/content/drive/My Drive/TFM/eval_results_lip" \
   --dataset_dir="/content/drive/My Drive/TFM/trainval_lip_tfrecord" \
   --max_number_of_iterations=1 \

With the following steps, an example of the problem I´m facing is:

Original image

Deeplabv3 result

I don´t know if I´m missing something important or if it needs more training. However, training does not seem to be a solution since loss its at the moment going up and down from 1.5 to 0.5, aprox.

Thanks in advance.


  • After some time, I did find a solution for this problem. An important thing to know is that, by default, train_crop_size and vis_crop_size are 513x513.

    The issue was due to vis_crop_size being smaller than the input images, so vis_crop_size is needed to be greater than the max dimension of the biggest image.

    In case you want to use, you must use the same logic than, so your masks are not cropped to 513 by default.