python-2.7 tensorflow conv-neural-network batch-normalization

dimensions in batch normalization

I'm trying to build a generalized batch normalization function in Tensorflow.

I learn batch normalization in this article that i found very kind.

I have a problem with the dimensions of the scale and beta variables: In my case batch normalization is applied to each activations of each convolutional layer, thus if i have as output of the convolutional layer a tersor with size:

[57,57,96]

i need that scale and beta have same dimension as the convolutional layer output, correct?

here's my function, the program works but i don't know if is correct

def batch_normalization_layer(batch):
   # Calculate batch mean and variance
    batch_mean, batch_var = tf.nn.moments(batch, axes=[0, 1, 2])

    # Apply the initial batch normalizing transform
    scale = tf.Variable(tf.ones([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
    beta = tf.Variable(tf.zeros([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))

    normalized_batch = tf.nn.batch_normalization(batch, batch_mean, batch_var, beta, scale, 0.0001)

    return normalized_batch

Solution

from the documentation of tf.nn.batch_normalization:

mean, variance, offset and scale are all expected to be of one of two shapes:

In all generality, they can have the same number of dimensions as the input x, with identical sizes as x for the dimensions that are not normalized over (the 'depth' dimension(s)), and dimension 1 for the others which are being normalized over. mean and variance in this case would typically be the outputs of tf.nn.moments(..., keep_dims=True) during training, or running averages thereof during inference.

In the common case where the 'depth' dimension is the last dimension in the input tensor x, they may be one dimensional tensors of the same size as the 'depth' dimension. This is the case for example for the common [batch, depth] layout of fully-connected layers, and [batch, height, width, depth] for convolutions. mean and variance in this case would typically be the outputs of tf.nn.moments(..., keep_dims=False) during training, or running averages thereof during inference.

With your values (scale=1.0 and offset=0) you can also just provide the value None.