machine-learning computer-vision convolution batch-normalization

Why batch normalization over channels only in CNN

I am wondering, if in Convolutional Neural Networks batch normalization should be applied with respect to every pixel separately, or should I take the mean of pixels with respect to each channel?

I saw that in the description of Tensorflow's tf.layers.batch_normalization it is suggested to perform bn with respect to the channels, but if I recall correctly, I have used the other approach with good results.

Solution

As far as I know, in feed-forward (dense) layers one applies batch normalization per each unit (neuron), because each of them has its own weights. Therefore, you normalize across feature axis.

But, in convolutional layers, the weights are shared across inputs, i.e., each feature map applies the same transformation to a different input's "volume". Therefore, you apply batch normalization using mean and variance per feature map, NOT per unit/neuron.

That's why I guess there is a difference in axis parameter value.