Why (MNIST trained) model is not good at digits which not in the center of the picture

About the problem

My CNN model has accuracy up to 99.4% on the MNIST dataset. So I try some irregular input. And the predicted result is not correct.

The following are some of the irregular input I use

As we know, CNN convolution will scan the whole image, also don't care about the key features in which areas of the image.

Why CNN could not deal with irregular input

Solution

As we know, CNN convolution will scan the whole image, also don't care about the key features in which areas of the image.

This is simply false. CNN do not "scan" image, a single filter can be seen as scanning, but the whole network does not. CNN is composed of many layers, which will eventually reduce amount of information, and at some point also use location-specific feature (in final fully connected layers, in some global averaging and so on). Consequently, while CNNs are robust to small perturbations (translations or noise, but not rotations!), they are not invariant to these transformations. In other words - moving an image 3 pixels to the left is fine, but trying to classify a number in completely different scale/position will fail because there is nothing forcing your model to be invariant to that. Some models that indeed learn these kind of invariances are Spatial Transformers Networks, but CNNs simply don't.