Search code examples
pytorchconv-neural-network

CNN: Why do we first resize the image to 256 and then center crop to 224?


The transformation for Alexnet image input is below:

transforms.Resize(256),
transforms.CenterCrop(224),

Why do we first resize the image to 256 and then center crop to 224? I know that 224x224 is the default image size of ImageNet but why we can't directly resize the image to 224x224?


Solution

  • Perhaps this is best illustrated visually. Consider the following image (128x128px):

    enter image description here

    Say we would resize it to 16x16px directly, we'd end up with:

    enter image description here

    But if we'd resize it to 24x24px first,

    enter image description here

    and then crop it to 16x16px, it would look like this:

    enter image description here

    As you see, it's getting rid of the border, while retains details in the center. Note the differences side by side: enter image description here enter image description here

    The same applies to 224px vs 256px, except this is at a larger resolution.