Search code examples
tensorflowneural-networkdeep-learningconvolutionbatch-normalization

Can we use batch normalization with transfer learning for an instance with different data distribution?


This tutorial has the tensor-flow implementation of batch normal layer for training and testing phases.

When we using transfer learning is it ok to use batch normalization layer? Specially when data distributions are different.

Because in the inference phase BN layer just uses fixed mini batch mean and variance(Which is calculated with the help of training distribution). So if our model has a different distribution of data , can it give wrong results?


Solution

  • With transfer learning, you're transferring the learned parameters from a domain to another. Usually, this means that you're keeping fixed the learned values of the convolutional layer whilst adding new fully connected layers that learn to classify the features extracted by the CNN.

    When you add batch normalization to every layer, you're injecting values sampled from the input distribution into the layer, in order to force the output layer to be normally distributed. In order of doing that, you compute the exponential moving average of the layer output and then in the testing phase, you subtract this value from the layer output.

    Although data dependent, this mean values (for every convolutional layer) are computed on the output of the layer, thus on the transformation learned.

    Thus, in my opinion, the various averages that the BN layer subtracts from its convolutional layer output are general enough to be transferred: they are computed on the transformed data and not on the original data. Moreover, the convolutional layer learns to extract local patterns thus they're more robust and difficult to influence.

    Thus, in short and in my opinion:

    you can apply transfer learning of convolutional layer with batch norm applied. But on fully connected layers the influence of the computed value (that see the whole input and not only local patches) can bee too much data dependent and thus I'll avoid it.

    However, as a rule of thumb: if you're insecure about something just try it and see if it works!