CNN: Why do we first resize the image to 256 and then center crop to 224?

The transformation for Alexnet image input is below:

transforms.Resize(256),
transforms.CenterCrop(224),

Why do we first resize the image to 256 and then center crop to 224? I know that 224x224 is the default image size of ImageNet but why we can't directly resize the image to 224x224?

Solution

Perhaps this is best illustrated visually. Consider the following image (128x128px):

Say we would resize it to 16x16px directly, we'd end up with:

But if we'd resize it to 24x24px first,

and then crop it to 16x16px, it would look like this:

As you see, it's getting rid of the border, while retains details in the center. Note the differences side by side:

The same applies to 224px vs 256px, except this is at a larger resolution.