tensorflow tensorflow-datasets image-size transfer-learning

Is it required to have predefined Image size to use transfer learning in Tensorflow?

I intend to use pre-trained model like faster_rcnn_resnet101_pets for Object Detection in Tensorflow environment as described here

I have collected several images for the training and testing set. All these images are of varying sizes. Do I have to resize them to a common size?

faster_rcnn_resnet101_pets uses resnet with input size 224x224x3.

Does this mean I have to resize all my images before sending them for training? Or It is taken care of automatically by TF.

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_resnet101_pets.config

In general, is it a good practice to have images of the same size?

Solution

No, you do not need to resize your input images to fixed shapes yourself. Tensorflow object detection api has a prepocessing step that will resize all input images. Following is a function defined within preprocessing step and there is a image_resizer_fn, it corresponds to a field named image_resizer within the config file.

def transform_input_data(tensor_dict,
                     model_preprocess_fn,
                     image_resizer_fn,
                     num_classes,
                     data_augmentation_fn=None,
                     merge_multiple_boxes=False,
                     retain_original_image=False,
                     use_multiclass_scores=False,
                     use_bfloat16=False):


"""A single function that is responsible for all input data transformations.
  Data transformation functions are applied in the following order.
  1. If key fields.InputDataFields.image_additional_channels is present in
     tensor_dict, the additional channels will be merged into
     fields.InputDataFields.image.
  2. data_augmentation_fn (optional): applied on tensor_dict.
  3. model_preprocess_fn: applied only on image tensor in tensor_dict.
  4. image_resizer_fn: applied on original image and instance mask tensor in
     tensor_dict.
  5. one_hot_encoding: applied to classes tensor in tensor_dict.
  6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
     same they can be merged into a single box with an associated k-hot class
     label.

According to the proto file, you can choose among 4 different image resizers, namely

keep_aspect_ratio_resizer
fixed_shape_resizer
identity_resizer
conditional_shape_resizer

Here is a sample config file for model faster_rcnn_resnet101_pets and the images are all reshaped with min_dimension=600 and max_dimension=1024

model {
  faster_rcnn {
    num_classes: 37
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }

In fact, the shape of resized images has big influence in the detection speed vs accuracy performance. Although there is no specific requirements for the input image sizes, it is better to have all images with least dimension bigger than a reasonable value in order for the convolutional operation to work properly.