Search code examples
deep-learningconv-neural-networkrecurrent-neural-networkimage-resizingdetectron

Detectron2 what is difference between INPUT.MIN_SIZE and INPUT.MAX_SIZE


In detectron2 training, in the config we can define the INPUT.MIN_SIZE and INPUT.MAX_SIZE to tell detectron what image resolution to scale to in terms of width(height is determined by Detectron2). I understand this scaling is done by Edge shortest size function.

My question is what is the difference between these two variables and how do they work in this function?

Let's say I have training image of 1280 x 720 but also training images of 2560 x 1440. I want to scale all images to minimum of 1280 x 720 so I would put INPUT.MIN_SIZE(720,). What would I populate the max variable then with and how would the algorithm handle this?


Solution

  • The answer is exactly as it says in ResizeShortestEdge documentation:

    It attempts to scale the shorter edge to the given short_edge_length, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.

    You can also see this here in the source code where it calculates the new shape (pasted below for easy reference).

        def get_output_shape(
            oldh: int, oldw: int, short_edge_length: int, max_size: int
        ) -> Tuple[int, int]:
            """
            Compute the output size given input size and target short edge length.
            """
            h, w = oldh, oldw
            size = short_edge_length * 1.0
            scale = size / min(h, w)
            if h < w:
                newh, neww = size, scale * w
            else:
                newh, neww = scale * h, size
            if max(newh, neww) > max_size:
                scale = max_size * 1.0 / max(newh, neww)
                newh = newh * scale
                neww = neww * scale
            neww = int(neww + 0.5)
            newh = int(newh + 0.5)
            return (newh, neww)
    

    So, in your case, you would be setting the short_edge_length to 720, and the max_size to 1280. If you do that, referring to the code above, you will see that,

    a) First, a scaling factor is calculated as 0.5 at the line scale = size / min(h, w).

    b) Then the new width and height are calculated as 1280, and 720 respectively at the line newh, neww = size, scale * w.

    c) Lastly, the condition if max(newh, neww) > max_size: is not satisfied so a new scaling factor is not calculated and the new width and height are not updated.

    Note that you can set the max_size higher than 1280 as well, and it will make no difference. If you set it to lower than 1280 though, the condition referred to in (c) will be true, so a new scale is calculated and the new width and height are downscaled.