Search code examples
kerasconv-neural-networkbatch-normalizationmomentum

Why is there just one momentum parameter in keras Batchnorm?


I am new to CNN and was implementing Batchnorm in CNN using keras. The Batch norm layer has 4*Feature_map(of prev layer) parameters. Which are as follows:

  1. 2 are gamma and beta
  2. The other 2 are for the exponential moving average of the mean and variance of mini-batches

Now, the exponential moving average of the mean and variance are defined as:

 running_mean = momentum * running_mean + (1 - momentum) * sample_mean
 running_var = momentum * running_var + (1 - momentum) * sample_var

In BatchNormalization function of keras I saw that there is just one hyperparameter named as momentum.

BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, **kwargs)

My question is why there are not separate hyperparameters of momentum for running mean as well as running variance?


Solution

  • The exponential moving average of mean and STD are something that these frameworks take care of under the hood. Also, those aren't learnable parameters (running mean and STD) so those aren't be trainable, I'd guess the momentum used in calculating those two values (exponential moving average of the mean and variance) is the same that has been decided in the batchNorm layer. As I said, the running mean and STD aren't learnable thus there's no question for different hyperparameter for them to tune their values.

    You could also check out these threads for more insight- moving mean and std in Keras batch norm