Search code examples
pythontensorflowobject-detectionobject-detection-api

What's the function of “keep_aspect_ratio_resizer {” in the config file of Tensorflow Object Detection API?


I use the Tensorflow Object Detection API to create an AI for Faster-RCNN. GitHub:Tensorflow/models

What kind of resizing function does "keep_aspect_ratio_resizer {" in the config file have?

I prepared images of 1920 x 1080 pixels and set "min dimension:" and "max dimension:" described immediately after "keep_aspect_ratio_resizer {" in the config file to 768 respectively.

In this case, the 1920x1080 pixel image would be resized to 768x768 pixels and input to the CNN. At this time, will the original ratio of the image (16: 9) be maintained? Namely, when the image is resized to 768x768 pixels, will the long sides of the image be converted to 768 pixels and black bars will be added in the margin of the image?

Or does the image ratio change from 16: 9 to 1: 1 and become contort when this setting?

If anyone knows about this, please let me know.

Thank you!


Solution

  • The definition of the different fields of the configuration files can be seen following this link: https://github.com/tensorflow/models/tree/master/research/object_detection/protos

    The keep_aspect_ratio_resizer field is in image_resizer.proto and state the following:

    // Configuration proto for image resizer that keeps aspect ratio.
    message KeepAspectRatioResizer {
      // Desired size of the smaller image dimension in pixels.
      optional int32 min_dimension = 1 [default = 600];
    
      // Desired size of the larger image dimension in pixels.
      optional int32 max_dimension = 2 [default = 1024];
    
      // Desired method when resizing image.
      optional ResizeType resize_method = 3 [default = BILINEAR];
    
      // Whether to pad the image with zeros so the output spatial size is
      // [max_dimension, max_dimension]. Note that the zeros are padded to the
      // bottom and the right of the resized image.
      optional bool pad_to_max_dimension = 4 [default = false];
    
      // Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
      optional bool convert_to_grayscale = 5 [default = false];
    
      // Per-channel pad value. This is only used when pad_to_max_dimension is True.
      // If unspecified, a default pad value of 0 is applied to all channels.
      repeated float per_channel_pad_value = 6;
    }
    

    Hence it is your choice to add padding (black bars) by adding the pad_to_max_dimension: true in your config file. Otherwise it should keep the aspect ratio.