tensorflow neural-network conv-neural-network batch-normalization

Why batch_normalization in tensorflow does not give expected results?

I would like to see the output of batch_normalization layer in a small example, but apparently I am doing something wrong so I get the same output as the input.

import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')

X = tf.placeholder(tf.float32,  shape=(None, 2, 2, 3))  #  samples are 2X2 images with 3 channels
outp =  tf.layers.batch_normalization(inputs=X,  axis=3)

x = np.random.rand(4, 2, 2, 3)  # sample set: 4 images

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    K.set_session(sess)
    a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
    print(a-x) # print the difference between input and normalized output

The input and output of the above code are almost identical. Can anyone point out the problem to me?

Solution

Remember that batch_normalization behaves differently at train and test time. Here, you have never "trained" your batch normalization, so the moving average it has learned is random but close to 0, and the moving variance factor close to 1, so the output is almost the same as the input. If you use K.learning_phase(): 1 you'll already see some differences (because it will normalize using the batch's average and standard deviation); if you first learn on a lot of examples and then test on some other ones you'll also see the normalization occuring, because the learnt mean and standard deviation will not be 0 and 1.

To see better the effects of batch norm, I'd also suggest you to multiply your input by a big number (say 100), so that you have a clear difference between unnormalized and normalized vectors, that will help you test what's going on.

EDIT: In your code as is, it seems that the update of the moving mean and moving variance is never done. You need to make sure the update ops are run, as indicated in batch_normalization's doc. The following lines should make it work:

outp =  tf.layers.batch_normalization(inputs=X,  axis=3, training=is_training, center=False, scale=False)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    outp = tf.identity(outp)

Below is my full working code (I got rid of Keras because I don't know it well, but you should be able to re-add it).

import tensorflow as tf
import numpy as np

X = tf.placeholder(tf.float32,  shape=(None, 2, 2, 3))  #  samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool,  shape=())  #  samples are 2X2 images with 3 channels
outp =  tf.layers.batch_normalization(inputs=X,  axis=3, training=is_training, center=False, scale=False)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    outp = tf.identity(outp)

x = np.random.rand(4, 2, 2, 3) * 100  # sample set: 4 images

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    initial = sess.run(outp, feed_dict={X:x, is_training: False})
    for i in range(10000):
        a = sess.run(outp, feed_dict={X:x, is_training: True})
        if (i % 1000 == 0):
            print("Step %i: " %i, a-x) # print the difference between input and normalized output

    final = sess.run(outp, feed_dict={X: x, is_training: False})
    print("initial: ", initial)
    print("final: ", final)
    assert not np.array_equal(initial, final)